Journal of Applied Psychology 


Joun G. Dartey, Editor 
University of Minnesota 





Table of Contents 


James Pertice Porter: 1873-1956: J. G. Darley 
Prediction of Soldiers’ Food Preferences by Laboratory Methods: D. R. Peryam and J. G. 


Areas of Radio Preferences: A Preliminary Inquiry: K. Lang 


Visibility on Radar Screens: The Effect of CRT Bias and Ambient Illumination: A. A. Smith 
and G. E. Boyes 


“Cloze” Readability Scores as Indices of Individual Differences in Comprehension and Ap- 
titude: W. L. Taylor 


Attitudes of White and Negro High School Students in a West Texas Town Toward School 
Integration: H. Greenberg, A. L. Chase, and T. M. Cannon, Jr 


Some Factors Influencing Income Aspiration: H. C. Ganguli 
Factors in Sales Success: D. E. Baier and R. D. Dugan 


The Relationship of Typographic Arrangement to the Learning of Technical Training Ma- 
terial: G. R. Klare, W. H. Nichols, and E. H. Shuford 


Some Personal and Social Attitudes of Habitual Traffic Violators: H. W. Case and R. G. 
Stewart 


An Operational Test of Laboratory Determined Optima of Screen Brightness and Ambient 
Illumination for Radar Reporting Rooms: E. G. Bessey and G. S. Machen 


An Investigation of Several Methods of Teaching Contour Interpretation: F. J. McGuigan. . 
Seniority and Criterion Measures of Job Proficiency: R. Jay and J. Copes 


The Relationship Between Grades and a Predictive Test Battery in the School of Pharmacy 
of The George Washington University: Suzanne D. Hill 


A Comparison of the Academic Aptitude of University Extension Degree Students and Campus 
Students: H. B. Farnum 


Rater Reliability and “Judgmental Demoralization”: A. W. Bendig 


A Note on a Punched-Card Method for the Solution of the Chi-Square Contingency Table: 
E, P. Buckley and G. C. Widding 


Erratum 





American Psychological Association 


Volume 41, Number 1 February, 1957. 





| Consulting Editors 


Haroip E. Burtt, Ohio State University 

A.pnonse Cuapanis, Johns Hopkins Uni- 
versity 

Cuirrorp E. Jurcensen, Minneapolis Gas 
Company 


Laurence S. McGaucuran, University of 
Houston 


Quinn McNemar, Stanford University 


—_ ag Min7z, City College of New 

or 

Haroitp F. Rotne, Fairbanks, Morse and 
Company 

Juiian B. Rotter, Ohio State University 

Tuomas A. Ryan, Cornell University 

Donatp E. Super, Columbia University 

Mies A. Tinker, University of Minnesota 

Atrrep C. WELCH, University of New 
Mexico 


Artuur C. Horrman, Managing Editor 
Heten Org, Assistant Managing Editor 
Editorial Staff: Sapre J. Dovie, Barsara Cummuncs, Frances Hazer 





This journal gives primary consideration to origi- 
nal investigations in any field of applied psychol- 
ogy except clinical and consulting psychology, al- 
though a descriptive or theoretical article may be 
accepted if it represents a special contribution in 
an applied field. Quantitative investigations of in- 
terest or value to psychologists working in the fol- 
lowing broad fields will be considered: vocational 
and educational prognosis, diagnosis, and guidance 
at the secondary and college level; personnel re- 
search in business, industry, and government; bio- 
mechanics; industrial working conditions; research 
on opinion and morale factors; job analysis and 
classification research; market and advertising re- 
search, 


Because of the large number of manuscripts sub- 
mitted, authors should adhere to the rule of 


“brevity consistent with clarity.” The typical 
manuscript should run to approximately 4,000 
words. There is a lag of approximately twelve 
months between receipt and publication of an 
article. Authors may request advanced publica- 
tion if they are prepared to pay the cost of print- 
ing the necessary extra pages. 


Manuscripts should be addressed to the Editor, 
John G. Darley, 408 Johnston Hall, University of 
Minnesota, Minneapolis 14, Minnesota. All manu- 
scripts should be submitted in duplicate. Original 
figures are prepared for publication; duplicate fig- 
ures may be photographic or pencil-drawn copies. 

Manuscripts must conform to the style require- 
ments described in the “Publication Manual of the 
American Psychological Association,” Psychol. Bull., 
1952, 49, No. 4, Part 2. 





Journal of Applied Psychology 


Published bimonthly by the 
American Psychological Association 
Prince and Lemon Sts., Lancaster, Pa. 
and 1333 Sixteenth Street N.W. 
Washington 6, D. C. 


$8.00 per volume 


$1.50 per issue 


Subscriptions, orders, and business communications should be addressed to the American Psychological Association, 
Address 


1333 Sixteenth St. N.W., Washington 6, D. C. 


changes must reach the subscription office by the 10th of 


the month to take effect the following month. Undelivered copies resulting from address changes will not be replaced; 


subscribers should notify the post office that they will guarantee second-class forwarding postage. 


Other claims for 


undelivered copies must be made within four months of publication. 
Entered as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the act of March 3, 1879. 
for mailing at the special rate of postage provided for in paragraph (d-2), Section 34.40, P. L. & R. 


Acceptance 
of 1948, authorized October 10, 1947. 


© 1957 by the American Psychological Association, Inc. 








JAMES PERTICE PORTER 
1873-1956 





Journal of Applied Psychology 








VoL. 41, No. 1 


FEBRUARY, 1957 








James Pertice Porter 


1873-1956 


James Pertice Porter, emeritus Professor of 
Psychology, Ohio University, died September 
15, 1956, at Swarthmore, Pennsylvania. 

It is uniquely fitting that the Journal of 
Applied Psychology should memorialize the 
closing of his long and dedicated career in 
this field of psychology. From 1921 to 1943, 
Dr. Porter was co-editor or editor of this 
Journal, which had been founded by G. Stan- 
ley Hall at Clark University in 1917. The 
fortieth volume of the Psychological Bulletin 
carried, in its report of the 1943 business 
meeting of the American Psychological Asso- 
ciation, the following statement: “The Jour- 
nal of Applied Psychology was purchased De- 
cember 16, 1942 from James P. Porter, owner 
and publisher.” Thus institutional responsi- 
bility was assumed for the perpetuation of a 
special field that has played such a distin- 
guished and expanding role in American psy- 
chology. 

Dr. Porter, by his editorial skill, faith in 
the new discipline, and long period of teach- 
ing service, provided a major force in this 
phase of psychology’s development. 

After serving as a high school teacher and 
principal, Dr. Porter completed his baccalau- 
reate and master’s degrees at Indiana Univer- 
sity in 1898 and 1901. He received the doc- 
toral degree from Clark University in 1905, 
remaining on the faculty of Clark College, 


with ultimate rank as professor of psychology 
and dean of the college, until 1922. He ac- 
cepted appointment as head of the depart- 
ment of psychology at Ohio University in 
1922, and remained in this post until his re- 
tirement in 1943. He served as President of 
the Midwestern Psychological Association in 
1941-42, and as President of the Ohio Acad- 
emy of Science in 1935-36. At the time of 
his retirement from Ohio University, his stu- 
dents established there a scholarship fund, 
which bears his name, for the use of under- 
graduates interested in the study of psy- 
chology. 

When history treads close on the heels of 
the present, as has been true in American psy- 
chology, many of us—students, friends, and 
professional acquaintances—feel a deep per 
sonal loss at the passing of the pioneers into 
whose orbits we were drawn and whose lives 
touched ours as they transmitted to us their 
enthusiasm, their knowledge, and their sense 
of dedication to the new discipline that was 
to become such a powerful force in American 
higher education and American life. James 
Pertice Porter was of this pioneering group. 
We join his many friends and many former 
students—and his family—in our real sense 
of personal and professional loss at his death 


Joun G. Dariry 





Journal of Applied Psychology 
Vol. 41, No. 1, 1957 


Prediction of Soldiers’ Food Preferences by Laboratory 
‘ Methods ' 


David R. Peryam and John G. Haynes * 


Quartermaster Food and Container Institute, Chicago, Illinois 


Prediction of the acceptability of foods to 
potential consumers has become an important 
problem to the food industries in recent years, 
and is perhaps even more important in plan- 
ning military feeding. The success of a com- 
mercial product may depend on the prefer- 
ences of a loyal minority, but military rations 
must take into account those of the entire 
population of servicemen. ‘The final criterion 
of the acceptability of foods must be that of 
consumption, but there are techniques of as- 
sessing acceptability besides the obviously 
valid one of recording eating behavior in the 
normal situation. The most common, the 
most efficient, and probably the most reliable 
is to measure the verbally expressed affective 
responses of a sample of consumers, and from 
these measurements establish the positions of 
various food items on some continuum from 


which acceptance behavior may be inferred. 
Before this method can be used effectively 


certain questions must be answered. What 
task should be set for the consumer subjects? 
What kind of experimental situations will call 
forth responses which are valid for predicting 
acceptance? Such problems are particularly 
important to the Armed Services. Military 
consumers are a fairly homogeneous group, 
but the conditions under which rations are 
used vary widely. Many military feeding 
situations are totally inaccessible for conduct- 
ing tests on foods, and others offer varying 
degrees of difficulty. The various types of 
pretesting that are used by the Quartermaster 
Corps may be conveniently classified accord- 
ing to whether testing is done in “artificial” 


This paper reports research undertaken at the 
Quartermaster Food and Container Institute for the 
Armed Forces, and has been assigned number 483 in 
the series of papers approved for publication. The 
views or conclusions contained in this report are 
those of the authors. They are not to be construed 
as necessarily reflecting the views or indorsement of 
the Department of Defense. 

“Present address: Armour Research Foundation, 
Chicago, Hlinois. 


or “natural” situations. The “artificial’’ situ- 
ations include (a) laboratory testing under 
controlled conditions using civilian subjects 
and (6) soldier-consumer panel testing at 
military posts using laboratory-like proce- 
dures. The “natural” situations include (a) 
normal mess-hall feeding, (b) planned test 
exercises where rations are used by selected 
groups of soldiers, and (c) regular field ma- 
neuvers where rations are used under nontest 
conditions. 

The relative value of these approaches will 
depend upon the criteria by which they are 
judged. If one demands experimental con- 
trol, or is particularly concerned about 
economy of testing, the “artificial” situations 
have the advantage. However, if attention 
is concentrated primarily on test validity, the 
“natural” situations are superior, since one 
is entitled to assume that results become 
more valid as the test situation more closely 
approximates the actual conditions of con- 
sumption, granting, of course, that the test 
population is always a good sample of the 
population of interest. The laboratory method 
is the one most used by the Quartermaster 
Corps. Important decisions as to the selec- 
tion or rejection of items are frequently made 
on the basis of laboratory results alone. 
However, there has been a tendency to dis- 
trust laboratory results and to require addi- 
tional testing in the field. It became appar- 
ent that the lack of knowledge of the true 
value of the various types of pretesiing, and 
of relationships among them, was retarding 
the ration development program and making 
it unduly expensive. The experiment re- 
ported here represented the initial phase of a 
program of research planned to remedy the 
situation. 

The test subject variable was selected for 
first investigation since it represented one of 
the most obvious differences between labora- 
tory tests and any field test conducted with 





Prediction of Soldiers’ Food Preferences 


service personnel. The problem may be 
stated as follows: How well do the relative 
preference ratings of foods by groups of sol- 
diers correspond with ratings by groups of 
civilians when the test situations are made to 
correspond as closely as possible? Referring 


to the classification scheme above, this repre- 
sented comparison of the laboratory and sol- 
dier-consumer panel situations. 


Procedure 


The laboratory tests were conducted at the Quar- 
termaster Food and Container Institute in the food 
acceptance laboratory, which is especially built for 
running sensory tests on foods. It is secluded, air- 
conditioned, and comfcrtable. Test subjects sit in 
panel booths separated from the food preparation 
room. Soldier-consumer panel tests were run at 
Fort Lee, Virginia, in a dining hall which was made 
available between regular meals for that purpose. 

The 12 test foods (see Table 3) were selected so 
that distinctly different types would be represented 
and so that their preference ratings, as established in 
previous laboratory tests, would cover a wide range 
Food materials for the two locations were drawn 
from a common source and methods of preparation 
were controlled to assure identity. Other control- 
lable physical factors, such as the holding time be- 
fore serving and the size of samples, were standard- 
ized at the two test locations. Time of testing in 
relation to regular meal-times was made comparable 

Preference was measured by means of a nine-in- 
terval rating scale, commonly known as the “hedonic 
scale,” which was developed at the Quartermaster 
Food and Container Institute in 1949 (3) and has 
been used extensively with satisfactory results (2, 
4). The questionnaire used at Fort Lee was headed 
by these instructions: 


We want to find out how well certain foods are 
liked by Army men. You will be served three 
samples of food, one after another. As soon as 
you finish each, show how much you liked or dis- 
liked it by marking on the scale underneath the 
name of that food. Then have a drink of water 
and wait for the next sample. Please do not talk 
about the foods during the test. It is important 
to have each man give his own answers—peoples’ 
likes and dislikes are expected to be different 


Three vertically oriented scales were arranged 
across the page below the instructions. Each was 
about five inches long with nine equally spaced in- 
tervals labeled with the following phrases, reading 
from top to bottom: “like extremely,” “like very 
much,” “like moderately,” “like slightly,” “neither 
like nor dislike,” “dislike slightly,” “dislike moder- 
ately,” “dislike very much,” “dislike extremely.” 
The appropriate food name was _ rubber-stamped 
above each scale prior to testing. The form used at 
the Institute was identical except that the instruc- 


tions were omitted since most of the Institute sub 
jects were already familiar with the method. New 
subjects were given oral instructions. None of the 
Fort Lee subjects had ever participated in a test of 
this kind before. 

It was not feasible to serve all 12 foods to one 
person at a single sitting Experience has shown 
that if the number of foods is not strictly limited, 
the ratings of those served later may be affected, 
usually showing a decrement. Therefore, only three 
foods were presented to each subject in each test 
session, so that four sessions were required to test 
one replication of the 12 foods. Combinations of 
foods were established for four replications in such 
a way that no two foods appeared together more 
than once. Four replicates were run at Fort Lee, 
but only replicates 1 and 2 at the Institute 

Forty persons participated in each session at the 
Institute. They were selected each time from a pool 
of approximately 600 employees of the Chicago 
Quartermaster Depot by a standard procedure de 
signed to obtain widespread participation. Most test 
subjects participated in only one session and none in 
more than two. Fifty soldiers participated in each 
Fort Lee session, each group being drawn from a 
different company 
not random since a small number of men were al 
ways unavailable for administrative reasons. Thus, 
there was not strict assurance that the groups were 
representative of the Army; on the other hand, no 
reason was known why their food preferences should 
have differed from those of the Army in general 

The Institute test subjects came to the laboratory 
in small groups. Each was given a questionnaire, 
with additional verbal instructions for those people 
who were new. They the panel 
booths and the three test items were presented one 
at a time in random order. At Fort Lee all 50 men 
were brought into the dining hall at the same time 
They were seated two at a table, where places had 
been prepared with questionnaires, water, and neces 
sary utensils, and were briefed by the test monitor 
beginning the test Avain, the food items 
were served one at a time in random order 


Selection within companies was 


were seated in 


before 


Results and Discussion 


The index of preference used here was that 
derived by assigning the values 1 to 9 to the 
scale categories, beginning at the “dislike ex- 
tremely” end, and taking the mean of the re- 
sulting distribution of values. The mean rat- 
ing and standard deviation were obtained for 
each food in each replicate. Thus there were 
six sets of ratings—four from Fort Lee and 
two from the Institute. 

Product-moment correlations between sets 
of ratings for the 12 foods were obtained for 
all possible pairings of individual laboratory 
and field replicates and also between sets of 








David R. Peryam and John G. Haynes 


Table 1 


Correlations Between Field and Laboratory Mean 
Ratings for Single Replicates and Mean Rat- 
ings Based on Combined Replicates 


(N = 50 for field replicate. 
replicate. 


N = 40 for laboratory 
All correlations positive) 


Labora- 
tory 
Repli- 
cates 
1&2 


88 88 90 

Single 92 92 95 
replicate : BA 78 87 

4 80 82 83 


Labora- 
tory 


Labora 
tory 
Repli Repli 
cate cate 


Field 1 2 


1&2 a1 89 92 
3&4 29 85 86 
1,2,3&4 86 86 .92 


Combined 
replicates 


Note. —Average (Fisher's hyperbolic arc-tangent transforma 
tion method) of 8 correlations between single replicates is .86 


ratings obtained by combining ratings from 
the individual replicates. These correlations, 
which are predictive validity coefficients in 
light of the purpose of the experiment, are 
shown in Table 1. (All of the correlations 
are positive.) Minimum validity is repre- 
sented by the eight correlations grouped to- 
gether in the upper lefthand corner of the 
table which were derived from the sets of 
ratings from single replicates. The remain- 
ing correlations all involve combinations of 
ratings from more than one replicate and 
demonstrate the expected improvement with 
increased length of test. 

The correlation between averages of the 
200 field ratings and 80 laboratory ratings 
was + .92. An equation expressing the rela- 
tionship may be written as follows: 


Y (field) = 1.23 X (lab) — 2.30. 


The assumption of linearity may be an over- 
simplification, subject to change on the basis 
of more extensive investigation; however, it 
seemed most appropriate for the present data. 
A scatter diagram of the data did not justify 
any other assumption. 

The above equation suggests that the two 
groups of subjects were responding differently 
in ways that affected both level of rating and 
units of discrimination. The grand mean 


over all foods for the laboratory was 6.43 as 
compared to 5.61 for the field, while the re- 
spective ranges of means were 4.81 and 5.82 
(Table 3). It is apparent that the soldiers 
responded to the low preference foods with 
more frequent and intense “dislike.” The 
soldiers’ comments written on the question- 
naires gave further evidence of this tendency 
to respond more strongly and with fewer in- 
hibitions than the typical laboratory subject, 
and suggested the possibility of differences in 
attitude toward the test situation as well as 
differences in attitude toward the foods. In 
spite of this, however, the high correlation 
shows that differences between foods pro- 
duced differences in evaluation behavior that 
were proportional for the two groups of sub- 
jects. 

Although secondary to validity in this ex- 
periment, test reliability was also considered. 
The “intralocation” correlations between sets 
of ratings provided a single estimate of labo- 
ratory reliability and six estimates of field re- 
liability (Table 2). It was expected a priori 
that the laboratory results would be more 
reliable because of better control in the labo- 
ratory situation. The laboratory correlation 
was .84 while the average field correlation 
was .93; however, only one intralaboratory 
correlation was obtained and this figure may 
not have been generally representative. The 
Spearman-Brown prophecy formula (1) shows 
that to obtain a reliability comparable to that 
in the field, the number of laboratory subjects 
would have to be increased only from 40 to 
120, ie., considerably fewer than the 200 
actually used in the field. 

Another aspect of reliability is presented in 


Table 2 
Intercorrelations Among the Four Field Replicates 


(N = 50. All correlations are positive) 


Replicate Replicate Replicate 
1 2 3 


Replicate 2 95 
Replicate 3 94 .92 
Replicate 4 .96 80 O11 


Note,.—Average (Fisher's hyperbolic arc-tangent transforma 
tion method) is .93 





Prediction of Soldiers’ Food Preferences 


Table 3 


Mean Preference Ratings and Standard Errors of the Mean for Laboratory and Field 


Laboratory* 
Actual 


Food x Sie 
8.43 
8.08 
7.14 
7.02 
6.88 
6.75 
6.66 
6.31 
5.69 
5.46 
5.15 
3.62 


Peaches 
Salmon 


.066 
91 
144 
.159 
175 
.154 
177 
236 
.223 
.186 
252 
230 


Corn 

Corned beef 
Ham and eggs 
Bread 

Carrots 
Sauerkraut 
Cheese bar 
Milk 

Cabbage 
Meat bar 


6.43 
4.81 


Grand mean 
Range of means 


* Combined data for two laboratory replicates, N 80 

t Combined data for four field replicates, N = 200 

t Projected to N = 200, assuming no change in variance 

§ Field SEm minus projected laboratory SEm 
Table 3 which gives the standard error of the 
mean (SE,,) for each food. Two figures are 
shown for the laboratory. Column 2 gives 
the actual value obtained from the distribu- 
tion of 80 laboratory ratings and Column 3 
projects this figure to N = 200, assuming no 
change in variance. For 10 of the 12 foods 
the projected laboratory SE,, is lower than 
the field SE,,, which indicates that a labora- 
tory retest should reproduce its numerical 
indices more accurately than a field retest. 
This further suggests that the field reliability 
coefficients were higher because of the larger 
N and the greater range of the scale utilized 
in the field and not because the rating of 
each food was more precisely located on the 
scale. 

Since the results reported here were based 
on the testing of only a small number of 
foods selected from the hundreds of items 
which may be of concern in military feeding, 
the possible effects of selection bias should 
be considered. The fact that the foods were 
not randomly selected detracts from the gen- 
eral applicability of the findings. Consumers 
tend to like, rather than dislike, the great ma- 


Projected 


SEmt Differences§ 
042 
O58 
091 
100 
ALM 
097 
O88 
149 

141 
118 


00S 
053 
005 
044 
029 
OA2 
076 
002 
O15 
060 
009 
O15 


jority of items that are available for use in 
military as well as civilian feeding. In the 
present experiment the test foods were se-. 
lected to cover a wide range of preference; 
hence, there was considerably more loading 
with low preference foods than would have 
been the case had the items been randomly 
selected. At the same time, use of the wider 
range of the scale should have improved the 
correlation. However, another factor in the 
present experiment would have tended to 
lower, rather than raise the correlation, if we 
may assume that the probability of finding 
differences between laboratory and field would 
increase as the group of test foods became 
more heterogeneous. The attempt was made 
to maximize heterogeneity by selecting foods 
to cover a wide range of food types so that 
there was greater opportunity for differences 
to appear than would be the case with ran- 
dom selection of test foods. 

Some further limitations on the significance 
of these results for the ration-testing program 
should be noted. First, neither group of test 
subjects was a random sample of a well-de- 
fined population. They were merely typical 





6 David R. Peryam and John G. Haynes 


of what might be expected on a continuing 
basis in the two test situations. Further, only 
certain ones of the many possible sources of 
variation between types of pretests were al- 
lowed to operate, e.g., quite a number of fac- 
tors which would affect preferences in nor- 
mal mess-hall feeding may have been entirely 
disregarded. However, this was deliberately 
accepted in designing the experiment. The 
intent was to compare two practical test situ- 
ations where the two types of subjects could 
be reached, controlling only in regard to those 
factors which could be considered incidental, 
such as the rating scale, the number and com- 
bination of samples, and the food materials 
and their preparation and serving. Test sub- 
jects, test location, and certain conditions in- 
separable from test location varied independ- 
ently. Under these conditions, representing 
what is normally attainable, good correlation 
was established. This both supports the in- 
ference that soldiers’ food preferences are the 
same as those of the civilian population and 
demonstrates the practical equivalence of the 
two test procedures. The “intra-” and “inter- 
situation” correlations were of the same or- 
der, which suggests that any noncorrespond- 
ence between test results is just as likely to 
have been due to unreliability of the basic 
method as to differences between the subjects 
or the situations. 

These results have very satisfactory impli- 
cations for the methods of food acceptance 
evaluation currently being used by the Quar- 
termaster Corps. It has been shown that 
laboratory ratings for a series of foods will ac- 
curately predict relative preferences as estab- 


lished by the soldier-consumer panel method. 
The fact that the validity of neither method 
for predicting actual food acceptance has been 
established does not detract from the impor- 
tance of the finding. It represents significant 
progress toward rationalization and integra- 
tion of methods for the pretesting of rations 
and serves as a sound basis for eliminating 
much expensive and unnecessary field testing. 


Conclusions 


The primary conclusion was a practical and 
specific one, namely, that pretesting of ra- 
tions in the Institute laboratory may be 
considered equivalent to pretesting by the 
soldier-consumer panel method. Corollary 
conclusions were: (a) both laboratory and 
field preference ratings have satisfactory re- 
liability and (6) the hedonic scale method 
is adequate for evaluating food preferences 
under varying conditions. 


Received March 20, 1956 


References 


Gulliksen, H 
York: Wiley, 1950 

Peryam, D. R. Field testing of Armed Forces 
rations. In D. R. Peryam, F. J. Pilgrim, & 
M. S. Peterson (Eds.), Food acceptance test- 
ing methodology, a symposium. Washington, 
D. C.: National Research Council, Advisory 
Board on Quartermaster Research & Develop- 
ment, 1954. Pp. 75-85 

Peryam, D. R., & Girardot, 
taste test method 
48-61 

Wood, K. R. & Peryam, D. R Preliminary 
analysis of five food preference surveys. Food 
Technol., 1953, 6, 248-249 


Theory of mental tests New 


N. F. Advanced 
Food Engng, 1952, 194, 





Journal of Applied Psychology 
Vol. 41, No. 1, 1957 


Areas of Radio Preferences: A Preliminary Inquiry ° 


Kurt Lang 


Queens College 


Broadcasters (and people in general) ha- 
bitually classify programs into types. Some 
of these classifications may be no more than 
attempts to delineate areas of administrative 
‘responsibility. But the more widely used 
classifications refer to different aspects of the 
content, format, subject matter, and style 
of presentation. An implicit assumption that 
the categories reflect, in some unspecified 
way, an appeal to some particular audience 
interest usually lies behind these classifica- 
tions. Without a doubt, there is a certain 
pragmatic justification to this assumption. 
But the relationship between content (or 
similar) classifications and the audience needs 
to be investigated, especially the possibility 
that apparently similar programs may exer- 
cise fundamentally different appeals and that 
programs of obviously different content may 
nonetheless appeal to much the same audi- 
ence. 

With some exceptions, the classifications of 
programs in terms of their audience tend to 
take either one of two approaches. The cate- 
gory is sometimes defined in terms of some 
“target” audience, toward whom, among those 
available for listening, the program is pitched 

-most commonly “women’s,” “children’s” 
and perhaps “farm” programs. At other 
times, some attempt is made to determine 
empirically the educational, age, sex, etc., 
composition of the audience for a particular 
program. The justification of neither ap- 
proach is questioned. There is beyond a 
doubt a daytime audience, different in com- 
position from those available for evening 
listening. Further, knowledge about the age, 
sex, and socioeconomic characteristics of an 
available audience yields excellent clues about 
what is likely to interest them. Both ap- 


1 Grateful acknowledgment is made to the Cana- 
dian Broadcasting Corporation and its Director of 
Audience Research, Mr. N. M. Morrison, for per 
mission to use their data. The author also would 
like to thank Dr. Joseph A. Patton of CBC’s To 
ronto Office for his helpful suggestions. 


proaches are important in determining who 
listens to what and what listeners want. 

An alternate approach consists in the deri- 
vation of program categories from the listen 
ing habits or the expressed preferences of 
listeners. Robinson (5), using 20 program 
categories of his own, tried to extract com- 
mon factors that would explain the associa- 
tion between likings for different types of pro- 
grams. Lazarsfeld and his associates (1, 4) 
have tried to develop a systematic notion of 
“overlap” whereby a common preference for 
two types of programs could be given mathe- 
matical expression. In these and _ related 
studies, some effort was made to define the 
notion of “psychological propinquity’’ among 
pairs of favorite program types. But in deal- 
ing with many programs, this approach as- 
sumes at the outset that it is possible to 
classify them along a single-dimension of dis- 
tance.* Further, it is the conclusion of both 
Robinson and Lazarsfeld that these investi- 
gations would do better to proceed by way of 
particular program items, chosen for their 
relevance to some problem, rather than by 
way of loosely defined types 


Description of Study 


The subsequent analysis is based on data 
obtained during interviews with 497 respond- 
ents in households, randomly selected to rep- 
resent all households in the greater Halifax 
area. It makes use of the answers to an 
inquiry about the respondents’ five favorite 
radio programs.* We counted all mentions 
of favorites, regardless of position, as equal. 
From every case in which a pair of programs 
is mentioned by one respondent, we draw 
support for the generalization that the pro- 
grams are similar in their appeal. This infer- 

2 It is applied to four kinds of music by Lazarsfeld 
and Kendall (4, p. 32 f.). In a study of magazine 
reading, the possibility of constructing a continuum 
of reading levels was specifically investigated (3). 

*The information was obtained before television 


was available. Interviews constituted the “before” 
wave of what was to be a before-and-after survey. 








% Kurt Lang 


ral of similarity is based on the observation 
that a person’s preference for one program is 
compatible with that same person favoring 
the other as well. We may express the de- 
gree of “compatibility” between two programs 
as that proportion of the audience who, stat- 
ing a preference for either one of the pair, 
prefer the two together. In other words, 
“overlap” is that proportion of persons who 
express a preference for both. Thus 


ab 


Overlz 
verlap 438 


ab’ 

where ab equals the number with a joint pref- 
erence, and where A and B represent all those 
naming either one as favorite, jointly with 
the other or by itself.* 

The overlap between any pair of favorites 
may be large or small. Theoretically, it may 
vary from O (when there are no individuals 
listing the pair of favorites) to 1 (when the 
two audiences are identical in that everyone 
naming one also names the other). A high 
overlap thus indicates that the two programs 
appeal pretty much to the same persons. 
Conversely, the smaller the overlap between 
a pair of programs the less compatible is their 
appeal in the sense that one and the same 
person is likely to list both among his fa- 
vorites. 


“Favorites” and Ratings 


One note on the use of “favorites” rather 
than actual listening. What a person listens 
to may be a result of many factors: schedul- 
ing, competition from other programs, house- 
hold and work requirements, general habit, 
and who knows what. Yet a statement of 
preference for a particular program implies a 
more deliberate choice on the part of the lis- 
tener than the mere fact of listening. There 
is of course a relationship between the two, 
for no program is likely to become a subject's 
favorite unless it is first listened to. Those 
citing a program as a favorite may constitute 
a large or a small part of the actual listening 
audience, but it is on the basis of expressed 


*The above formula is preferred to that used by 


ab x 
Lazarsfeld (overlap =), it used a 


VAXB 
concrete group of persons instead of a geometric 
average 


because 


favorites that the popularity of a program 
will be examined. 

For twenty of the most often mentioned 
“favorites,” the rank-order correlation be- 
tween general popularity, in terms of the 
number citing the program, and the program 
rating ° yielded a significant rho of .66. Con- 
sidering that the preferences are those of in- 
dividuals, whereas the ratings are based on 
sets in use (i.e., households), and considering 
further that both preferences and ratings are 
subject to fluctuation, the relationship seems 
quite definite. Since all the programs used 
in this analysis received frequent mentions, 
these are obviously programs which exercise 
a strong appeal to a fairly large core of loyal 
followers. 


Compatibility 


The investigation concerning what propor- 
tion of the audience preferring either of a 
pair of programs expresses a joint preference 
for both yields a crude numerical measure of 
overlap. It tells us, in the case of each pair 
of programs, whether they are the favorites 
of the same or different people. 

Table 1 shows every pair of programs with 
an overlap above .12. These are the pro- 
grams with the largest ‘carry-over,’ meas- 
ured in terms of overlapping preferences. A 
figure of .44 thus means that of all the per- 
sons who mention either “Fibber McGee and 
Molly” or “The Great Gildersleeve” (both 
U. S. situation comedy) as their favorite 
nearly one-half also mention the other. This 
means that almost as many people pair these 
programs as their favorites as mention either 
one individually. This is an extremely high 
amount of duplication or overlap among the 
preferences. Even some programs with small 
ratings (i.e., the “Metropolitan Opera’’) re- 
ceived a fairly large number of mentions as 
favorites (by about five per cent of the sam- 
ple), and among a considerable number of 
persons the preferences for a pair of such 
programs overlapped. On the other hand, 

’ Ratings were obtained from listening diaries, 
which the respondents were asked to keep concur- 
rently with the study. Somewhat over one-half the 
sample returned diaries. If a program was not 
broadcast during this period, the most recent rating 


available from the Elliott-Haynes Rating Reports 
was used. 





Areas of Radio Preferences 9 


there are quite a few pairs for which prefer- 
ences are not at all associated in one and the 
same person and where, therefore, the over- 
lap is zero. But in every such case the gen- 
eral popularity of each program individually 
was too small to allow us to attribute this to 
systematic repulsion rather than to the broad 
scattering of mentions for the program. 
From the last two columns in Table 1, it 
is also possible to see what proportion of 
those who mention one member of any pair 
also mention the other. The figure is higher, 
in each case, for the program receiving the 
fewer mentions and which is, by implication, 
less popular. To some extent this is an arti- 
fact of the method. While a high numerical 


value for overlap can be taken as an indica- 
tion of the compatibility in the preferences, 
the measure gives only a crude indication. 
There are two reasons: First, the overlap fig- 
ure can approach | as a potential maximum 
only if the number of mentions each member 
of the pair receives are exactly equal. 


Thus 


the proportion among those “liking” the less 
popular program who also like a more popular 
program is always greater than the propor- 
tion pairing a popular favorite with a rela- 
tively less universally liked program (see 
Table 1). Second, the overlap figure fails to 
take into account the contribution of the 
popularity a program enjoys, in terms of the 
total number of mentions; two very large 
programs, each with a “majority” audience, 
would of necessity show overlap. While this 
indicates compatibility, one should not deduce 
therefrom any intrinsic similarity in the ap- 
peal. A certain overlap would result purely 
on the basis of chance, even if a certain num- 
ber of mentions were randomly scattered 
among persons of all backgrounds and regard- 
less of the fare they demand from radio. 
Therefore, the second problem for investi 
gation is whether or not the preferences for 
any program are evenly scattered among in- 
dividuals of all tastes and whether or not 
particular pairs exert an attraction on each 


Table 1 


Proportion of Overlap for Program Pairs 


Name of Program 


Proportion of 


a* Who 
Mention 6 


b Who 


Overlap Mention a 


“Fibber McGee and Molly’’/b. “The Great Gildersleeve” Wy Oo 63 


. “Our Miss Brooks’”/b. “Amos and Andy” 

. “Our Miss Brooks”/b. “People are Funny” 
“Our Miss Brooks’’/b. “Lux Radio Theatre” 
“Our Miss Brooks’’/b 
Toronto Symphony (and symphony)/b 
“Our Miss Brooks’ /b. “Fibber McGee and Molly” 
“Amos and Andy’ /b. “People are Funny” 

“Amos and Andy’’/b. “Fibber McGee and Molly” 

. “Lux Radio Theater” /b 
“Lux Radio Theater” 


“The Great Gildersleeve’ 


“The Great Gildersleeve’ 

/b. “People are Funny” 

“Amos and Andy’’/b. “The Great Gildersleeve”’ 

Abbie Lane/b 

“Amos and Andy”/b. “Lux Radio Theater” 

National Hockey League/b. Boxing 
“The Proctor and Gamble Hour’/b 

. “Our Miss Brooks”/b. Edmund Morris’ Newscast 
“Our Miss Brooks’’/b. “Boston Blackie” 

. “People are Funny’ /b. “Boston Blackie” 

. Abbie Lane/b 


Kate Aitken/b. Anna Dexter (women’s commentators) 


* Program ‘‘a,”’ in each instance, denotes the more 


Anna Dexter (women’s commentators 


Kate Aitken (women’s commentators) 


popular 


$4 41 66 
26 $1 63 
25 4] 58 
19 23 53 


Metropolitan Operé 18 25 43 


17 21 51 
17 26 43 
17 26 35 
17 26 
16 26 
15 23 
15 23 
14 23 
13 21 


“Our Gal Sunday” 13 18 


12 13 
12 12 
12 15 
12 

12 21 


of the pair 








Kurt Lang 


, Table 2 


Attraction Among Seven ‘Majority’ Programs 


“Lux Radio 
Theater” 


“Amos and 


Andy” 


“Our Miss Brooks’’t 
(217) 

“Amos and Andy” 
(131) 

“Lux Radio Theater” 
(116) 

“People are Funny” 
(106) 

“Fibber McGee and 
Molly” (98) 

“The Great Gilder 
sleeve”’ (94) 


34 
Fn 


2S 
i 

14 
1.00 


* Ol <p 

7p < Ol 

t The number in parentheses after each program in thi 
mentions for the program 


O5 


other over and above that resulting from the 
popularity of each by itself. 


Is There a “Majority” Taste? 


An inspection of Table 1 will immediately 
reveal that relatively few, ie., seven, of the 
programs make up a majority of the pairs 
with high observed overlap. These programs 
are also those most often cited as favorites. 
They hold first through sixth and ninth place 
in the frequency of mention. How much of 
the overlap is accounted for by this popu- 
larity ? 

Table 2 shows the relationship between the 
observed overlap and that expected as a re- 
sult of the general popularity of the program, 
even if preferences were randomly distributed 
among all persons. In the upper right of each 
box is a figure showing the observed overlap. 
The figure in the lower left is an index, which 
might be called an index of attraction. It 
designates the ratio of observed to expected 
overlap. An index larger than | thus indi- 
cates that more people give joint preferences 
for the pair of programs than one would ex- 
pect on the basis of chance. Similarly, an 
index smaller than | indicates that joint pref- 
erences for the pair occur less frequently than 
chance. 

For most of these pairs the index exceeds 


“People Are 


“The Great 
Gilder- 
sleeve”’ 


“Boston 
“Fibber Blackie” 
Funny” McGee” 
26 
1.45** 
A7 


17 19 


1.05* 1.22* 
17 AS 
1.24 1.30 1.20 


16 10 17 


1.21 88 1.37 


10 09 
91 80 
H 


3.19°* 


table as well as in tables to follow indicates the total number of 


unity. This also holds true for the majority 
of pairs formed by any one program. Nor 
are there any pairs for which the index is 
significantly smaller than unity. The signifi- 
cance of some of the differences is undoubt- 
edly kept down by the fact that respondents 
had to mention the program on their own 
and by the small overlap frequencies that re- 
sulted. But, on the whole, the overlap is 
greater than that expected on the basis of 
general popularity and a fair number of pairs 
are significant above chance. 

As far as these data go, they are com- 
patible with the view that liking for any of 
the seven “majority” programs reflects the 
same underlying taste. The preferences are 
compatible with each other, but more than 
that: if a person mentions one of them, this 
increases the likelihood that he will mention 
certain others among them as well. Thus we 
may say that a liking for each of these ‘pro 
grams expresses a particular area of prefer- 
ences. They are located contiguous to each 
other.® 

® There is one other sense in which they are con 
tiguous. “Fibber McGee and Molly” together with 
“The Great Gildersleeve” are presented in a_half- 
hour package. Likewise, “Boston Blackie” follows 
“Our Miss Brooks” on the same station, and “Our 
Miss Brooks” comes right after “People Are Funny,” 


though on a different station. Undoubtedly, the 
scheduling plays some role, especially in audience 





Areas of Radio Preferences 11 


The next problem is to investigate whether 
or not these preferences for “majority” pro- 
grams tend to set themselves off from other, 
more specialized, areas of taste. 


“Minority” Clusters 


Not all pairs in Table 1 have one of these 
almost universally popular favorites as a mem- 
ber. The remaining task will be to locate 
other clusters of favorites and to determine if 
programs in the clusters are negatively asso- 
ciated with these “majority” preferences. 

Music. The highest overlap besides those 
made up of “majority” programs is that be- 
tween the Toronto Symphony and the Metro- 
politan Opera broadcast. The probability 
that this high overlap value is due to chance 
is less than .001. Clearly we are dealing 
with an audience liking a “serious” musical 
fare. 

Two additional programs on our list of 
most frequently cited favorites seem to exer- 
cise a definite attraction for persons with 
these musical tastes. They are “Rawhide” 
(a Canadian CBC program, featuring imper- 
sonations and records) and “Stage °55” (a 
Canadian CBC series of radio dramas). The 
matrix in Table 3 shows that this set has high 
positive index values, half of which are sig- 
nificantly above unity. Therefore it appears 
that these four programs tap adjacent pref- 
erences. They constitute a cluster of favor- 
ites. Half of the possible overlap values 
among these four programs turn out to be 
significant, even though the “overlap” cells 
contain extremely few members. Every one 
of the values exceeded those attributable to 
frequency of mention by itself. 

How is an expressed preference for a pro- 
gram in this cluster related to the “majority” 
taste? There was no instance of the audi- 
ence for any one of the four programs in this 
cluster citing any particular “majority” pro- 
grams significantly more frequently than the 
sample as a whole. On the other hand, there 
were two instances in which majority prefer- 
ences were significantly less frequently repre- 


building. While this factor was not subjected to 
systematic scrutiny, as far as the entire list of pro- 
grams here investigated is concerned, no persistent 
relation between adjacent scheduling and the size of 
the overlap was found. 


sented among those preferring one or more 
programs in this cluster. Among those who 
cite the Toronto Symphony and other radio 
symphonies, there are fewer who mention 
“Our Miss Brooks” (p< .02) and “The 
Great Gildersleeve” (p < .01) (both popular 
American comedy situation shows) than in 
the sample as a whole. In general, the pro- 
portion mentioning majority programs was 
smaller than expected, even though devia- 
tions were not statistically significant except 
as indicated above. 

Cases of below-chance overlap raise a spe- 
cial problem of interpretation. Obviously, 
one cannot assume that the failure to include 
a specific program among a person's five or 
so favorites signifies a “low” liking (or dis- 
like). Music lovers may have an especially 
strong interest in radio symphony and _ the 
other radio fare that goes with this prefer- 
ence. Their interest in the specialized fare 
may be far stronger than their interest in the 
popular fare, but this need not involve any 
antipathy toward the latter. The specialized 
fare may displace the popular fare from its 
position of “favorite,” because of its higher 
comparative appeal. But in the absence of 
any absolute measure of liking, one cannot 
conclude that they are less liked as a result. 
The data offer no ground for choosing be- 
tween the alternate explanations; that is, 
whether failure to mention “majority” pro- 
grams spells rejection or not. Still, the fact 
that some of these musical preferences over 
lap significantly with drama and “comedy’ 


Table 3 


Mutual Attraction Among Four “Minority” Programs 


Metro 
politan 
Opera 


(21) 


“Rawhide” “Stage ’55 


Toronto Symphony 10 07 18 
(37) 1a" 


“Rawhide” 


(31) 


“Stage 
(24) 


“p< 








12 Kurt Lang 


not part of the majority cluster, suggests that 
there is at least a certain alienation between 
a “minority taste’ and popular fare, in the 
sense that, if this “music audience’’ cites 
drama programs as “favorites,” those cited 
are not the dramas with the most universal 
appeal. 

Inspirational. A second cluster involving 
dissimilar program types appeared between 
those who cited the religious devotions and 
other church programs together with a show 
featuring amateur singing talent (“Singing 
Stars of Tomorrow’). This was the only 
significant attraction for either member of 
the pair, suggesting that there may be some 
common imspirational appeal, rather than the 
music, that is common to both. It is also 
possible that the type of song featured, rather 
than the “amateur” aspect, is what makes the 
difference. The amateur program does not 
include current song hits, but features largely 
“established” and operatic music. The pro- 
portion of persons who like another singing 
program of Western songs (“Western Airs”’) 
is significantly smaller among the audience 
for the amateur singing show than for the 
sample as a whole. Nor is the amateur sing- 
ing show related to other musical programs 
in any consistent way. 

Women’s commentators. The two previous 
clusters contained program pairs which were 
different both with regard to content and 
format. In the case of women’s commenta- 
tors, preferences are clustered around pro- 
grams of similar format. The overlap values 
among three different women’s commentators 
are very high and the probability that they 
are due to chance factors is definitely less 
than one out of 100.’ 

It is important to observe that having one 
woman commentator for a favorite does not 
exclude a preference for others. There is not 
only compatibility between such a pair of 
preferences, but if a person mentions one of 
the commentators, this increases the prob- 
ability that she will mention one of the 
others as well. The preferences exercise a 
mutual attraction over each other, which 
holds even if we base our calculations only 
on women who are available for daytime lis- 


7 This table has been deleted to conserve space. 


tening. In addition, there are few cases of 
significant overlap with other daytime pro- 
grams. 

But if one constructs the preference profile 
of the audience for each of the commenta- 
tors, notwithstanding their over-all compati- 
bility, certain suggestions about the differ- 
ence between them emerge. A preference for 
the most popular of the commentators (Abbie 
Lane) seems to have no particular relation to 
any one of the other programs on the list. 
The commentator who is second in number 
of mentions (Kate Aitken) has an especially 
large following among “soap opera” fans. 
The attraction is significant, and it is, more- 
over, significantly greater than that found 
among the following of the third women’s 
commentator (Anna Dexter). The last is 
the one among the three commentators whose 
following shows closest affinity for the ma- 
jority taste, overlapping significantly with two 
“majority” situation comedies (‘Fibber Mc- 
Gee,” p < .02; and “The Great Gildersleeve,” 
p< 01). If we had more detailed informa- 
tion, it might be possible to document the 
specific appeals of each in terms of these 
preference profiles. 

“Soaps” and sports. Two similar (and not 
unexpected) clusters were found for a pair of 
popular “soap operas” and for a pair of 
sports programs. The attraction within each 
pair is exercised by programs basically simi- 
lar in format and content. But “soaps” and 
sports as such are not very compatible with 
each other. The “soap operas” are specifi- 
cally addressed to a daytime women’s audi- 
ence. Though available for sportcasts, the 
“soap audience” has a significantly smaller 
share of sports fans than the entire sample. 
We may be dealing with sex differences. 
Nonetheless, the question may be raised, 
whether in terms of their preference profiles, 
“soap addicts” and sports fans constitute 
specialized audiences. 

Consequently we defined the “soap opera”’ 
audience to include all persons who men- 
tioned either “soaps” in general or any par- 
ticular “soap opera’ among their favorites. 
The same was done for the sports audience. 
There were 134 people in the “soap opera” 
audience and 124 people in the sports audi- 





Areas of Radio Preferences 


ence so defined. The two are of course nega- 
tively related; the overlap is significantly be- 
low chance (p < .01). In addition there are 
no positively related pairs consisting of this 
“soap opera” audience, on the one hand, and 
any of the seven “majority” programs, on the 
‘ other. The index of attraction is below unity 
for each of them. Significantly (.05 level or 
above) fewer persons cite four of the seven 
“majority” programs than the sample as a 
whole. By contrast, among sports fans, “fa- 
voring” the majority fare tends to be more 
frequent than the sample proportion, though 
the difference reaches the .05 significance 
level only with regard to only one program, 
“Boston Blackie.” 

There is some suggestion therefore that the 
“soap” fans constitute a special audience, 
rather than a run-of-the-mill audience who 
incidentally happen to like the “soaps.” Lik- 
ing “soap operas” also works against liking a 
good many programs other than the popular 
favorites. There are several instances where 
a “soap” preference is negatively paired. At 
the same time, there is significant attraction 
between the “soap” listener and those who 
mention one or more quiz programs as their 
favorites (p < .O1). The sports audience, on 
the other hand, shows no particular disaffinity 
toward any particular type of program, ex- 
cept daytime women’s programs. Radio sports 
interest thus seems much more evenly dis- 
tributed, at least among the male sector of 
the population, than the preferences for most 
other programs. The data show no cluster- 
ing, other than with other sports programs. 


Discussion 


In the foregoing analysis we have at- 
tempted to map out an approach to program 
categories by making use of listeners’ listing 
of their favorites. If one and the same per- 
son can express a preference for a pair of pro- 
grams, this is taken as indicative of program 
compatibility. But beyond that, it is as- 
sumed that the preferences of any one person 
are not randomly scattered throughout all 
areas of interest. When the preferences for 
two programs are more often expressed by 
the same persons than would be expected on 
the basis of general program popularity, we 


Table 4 
Mutual Attraction Among Sports and 
“Soap Opera” Programs 


National 
Hockey 
League 

(42) 


“Our Gal 


Sunday” Boxing 


“Proctor Gamble 13 Ol 04 
Hour” (40) 4.12** d 91 


“Our Gal Sunday” 
(21) 


Boxing (36) 


*p < Ol 


assume that the programs are adjacently lo- 
cated in an area representing all possible 
radio tastes. This dispenses with the idea 
of a single continuum, where the level of 
preferences is sought. Instead, we seek to 
locate distinct areas of overlap. 

The findings are compatible with the view 
that there is a definite and distinguishable 
“majority” taste.* Programs receiving the 
highest number of mentions as “favorites” 
are also more frequently paired with each 
other than with other programs. The popu- 
larity they thus enjoy is not “universal” but 
lies in their appeal to a more or less clearly 
marked-off area of taste. Majority programs, 
it seems, do not appeal to everyone. Rather 
they appeal to a particular audience, the one 
which represents the taste of the largest num- 
ber of radio listeners. What we are suggest- 
ing is that there is a “great audience,’ dis- 
tinct from the other not-so-great audiences, 
which has a specifiable set of preferences (6). 
The majority shows appear to tap primarily 
the preferences of this group. 

All of these “majority” programs need not 
necessarily have comparable content. While 
they include a preponderance of situation 
comedy, “serious” drama is also represented. 
But this does not mean that all programs 
with the same format or the same subject 
matter need to overlap significantly with the 

* The existence of such an attraction explains why 
the comparative appeal among popular favorites 


should follow the frequency with which they were 
mentioned individually (2). 








14 Kurt Lang 


majority taste. The overlap between “Lux 
Radio Theater,’ a popular drama favorite, 
and “Stage °55,” a serious “minority” taste 
contender, is no higher than that based on 
their individual popularity. It does not ap- 
pear, therefore, that “majority” shows enjoy 
their position as favorites because they touch 
on a conglomeration of all types of taste. 

To some extent, this has long been obvious. 
There are always people who resist the ap- 
peals of current fads or who develop strong 
antipathies toward popular favorites. Ma- 
jority audiences, it would follow, are built up 
by pitching the program to a particular audi- 
ence and developing the appropriate formula, 
regardless of subject matter. This is a topic 
which the data can only explore. The limita- 
tion of the data resides, of course, in the 
proportionately small number of mentions for 
many programs. In addition, respondents 
could only express their preferences.  Fail- 
ure to cite a particular program does not 
necessarily spell dislike, and below-chance at- 
traction cannot therefore be taken as clear- 
cut evidence of repulsion. The interest in 
specialized areas may be stronger than that 
in “general appeal’ programs, but even 
within a minority audience the latter may be 
well liked. At any rate, where specialized 
areas of taste exist—for serious music, for ex- 
ample—the appeal of the specialized fare to 
this audience would seem to be comparatively 


greater than that of the generally popular 
fare. 

Because of the methodological limitations, 
the hypothesis about the “majority” audi- 
ence must be considered tentative. An abso- 
lute measure of liking must be used, and thus 
it would be desirable to get more detailed in- 
formation about a specific list of programs. 
With this reservation, however, the approach 
to areas of mass-media taste by way of con- 
sumers’ stated preferences for particular pro- 
grams would appear to be a fruitful line of 
inquiry. 


Received March 22, 1956. 


References 


Dunn, S. W. Overlapping of listening among 
radio audiences. J. Marketing, 1952, 16, 315- 
321. 

. Gaudet, Hazel. The favorite radio program. J 
appl. Psychol., 1939, 23, 115-126. 

Kass, Babette. Overlapping magazine reading. 
In P. F. Lazarsfeld & F. N. Stanton (Eds.), 
Communications research, 1948-49. New York: 
Harper, 1949. Pp. 130-151. 

Lazarsfeld, P. F., & Kendall, P. Radio listening 
in America. New York: Prentice-Hall, 1948. 
Ch. 2. 

Robinson, W. S. Preliminary report on factors 
in radio listening. J. appl. Psychol., 1940, 
24, 831-837. 

Seldes, G. The great 
Viking, 1950 


audience. New York: 





Journal of Applied Psychology 
Vol. 41, No. 1, 1957 


Visibility on Radar Screens: The Effect of CRT Bias and 
Ambient Illumination ' 


A. A. Smith and G. E. Boyes 


Defence Research Medical Laboratories, Toronto 


Williams and his co-workers have shown 
(3, 4, 5) that the visibility of radar targets is 
affected by cathode-ray tube (CRT) bias and 
ambient illumination. Their visibility meas- 
ures were stated in decibels of attenuation of 
a reference voltage. A recent analysis (2) 
has made it possible to change such measures 
into percentage of maximum range. The Wil- 
liams data, thus analyzed, indicate that a gain 
in range of at least 25 per cent may be ex- 
pected if optimal values are used. This gain 
is big enough to call for further investigation. 

The Williams studies were done on a rela- 
tively small CRT, of 7” diameter. There was 
some question as to how far their results 
might apply to larger displays. Again, they 
tested CRT bias and room lighting inde- 
pendently. But both factors have to do 
with background brightness: some interaction 
might therefore be expected. Finally, the 
light levels tested by Williams did not fix 
precisely enough an upper limit of permis- 
sible room lighting. 


General Procedure 


The experiment was done in three stages. 
In the first two, we tried to find an optimal 
CRT bias, and to determine its interaction 
with ambient illumination. In the third, we 
sought a precise upper limit for room lighting. 

Apparatus. ‘The display unit was a Plan- 
Position Indicator (PPI) of standard service 
pattern, fitted with a 12” CRT. As in the 
Williams study, the phosphor was of the P-7 
type. This two-layer phosphor gives, on ini- 
tial excitation by electrons, a brief flash of 
bluish-white light; this initial flash then ex- 
cites a second layer to longer-lasting yellow- 
ish phosphorescence. In a PPI, the picture 
is that of a circular area scanned by a slowly 
rotating blue-white radius or “sweep-line’’; 


Medical 
No 


Research 
Project 


1 Defence 
No. 163-4, 
116). 


Laboratories Report 
D77-94-20-22 (H.R. No. 


15 


the sweep-line leaves behind it a gradient of 
yellow phosphorescence, against which the 
targets appear as small bright patches, sub 
tending at the usual viewing distance about 
20 minutes of visual angle. In operational 
radars, the picture is complicated by false 
targets caused by circuit “noise,” and by 
larger bright areas of “clutter” due to hills 
and other obstacles. In our experiments, 
these complications were absent: the display 
was “noise-free.” 

CRT bias was measured by a d.c. voltme- 
ter with 20,000 ohms per volt resistance, con- 
nected from cathode to grid. As ia all CRTs, 
the more negative the bias, the weaker is the 
electron stream which strikes the phosphor 
screen, and hence the dimmer the sweep-line 
and its trail residual phosphorescence. 
Following Williams and King (3) we defined 
a Visual Reference Intensity (VRI) as that 
CRT bias which gave a just-visible sweep 
line in foveal fixation after five minutes of 
dark adaptation. In our experiments, we 
found VRI to lie within one volt of 44 volts 
negative, for all Ss. All experimental bias 
values were recorded in volts positive from 
VRI; these values represent an ordered scale 
of sweep-line (and background) brightness 
Suitable photometric equipment was not 
available, and we therefore unable to 
convert these bias values into standard units 
of visual brightness. 

The radar display was placed in the centre 
of a room 9 feet by 10 feet by 10 feet high. 
The room was as nearly light tight as we 
could achieve; walls and ceiling were painted 
a uniform flat white, with a reflectance fac 
tor of 84 per cent. A variable light source 
was mounted 42” off the floor, in a corner of 
the room to the right rear of S. 
a 60-watt incandescent bulb in a box with 
a sliding top, emitted light in the vertical 
direction. This arrangement functioned as 
a crude “integrating box,’ and eliminated 


of 


were 


The source, 





A. A. Smith and G. E. Boyes 





(OECIBELS) 


AMBIENT 


.- 


ILLUMINATION 
DARK 

/ OO1FT CG 

‘ 7 OFT C 

/ 1orT Cc 


THRESHOLD VISIBILITY 








) 
i - 4 A 


VRI 10 15 
VOLTS BIAS (FROM VRI) 





Fic. 1. Target visibility as a function of CRT bias 
gradients of illumination across the display. 
Light levels at display center were measured 
with a Macbeth Illuminometer. 

A close simulation of radar targets was 
achieved through the use of a synthetic tar- 
get generator. This device, designed and 
built for us by Ferranti Electric, can produce 
a single radar target of known size and in- 
tensity at any bearing on a PPI display, and 
in any one of nine radial positions. Control 
of target strength is through a pair of loga- 
rithmically calibrated attenuators; the maxi- 
mum signal pulse, 30 volts at the grid of the 
CRT, can be attenuated in half-decibe] steps. 

Visibility thresholds. The dependent vari- 
able in all three experiments was the mini- 
mum signal strength of the target of first ap- 
pearance. Target location was fixed in the 
center of a half-inch square outlined in grease- 
pencil. The sweep-line passed under this 
square every 10 secs. The S was instructed 
to watch the face of the CRT in the marked 
area and, each time the sweep passed, to re- 
port whether or not the target was seen. 
Initial signal strength was set below thresh- 
old, and was increased by 1 db. every 10 
secs. Because the P-7 is a long-persistence 
phosphor, only ascending series could be used. 


Experiment I 


Subjects. Five male Ss were used. Two were 
service personnel, experienced in radar operation; 
the other three were members of the scientific staff 
of these laboratories. 

Method. CRT bias and ambient illumination were 
varied jointly through four values of each variable 
In a single one-hour session, each S determined one 
threshold under each of the 16 conditions. A differ- 
ent random order of conditions was used for each S 
The four bias values were VRI, and 5, 10, and 15 
volts positive with respect to VRI. The four levels 
of ambient illumination were dark (i., absence of 
any light source other than the glow from the CRT), 
and 0.01, 0.1, and 1.0 footcandle. 

Results. The data are shown in Fig. 1, with CRT 
bias plotted against signal attenuation for each of 
the four ambient light levels. 

The presence of an optimal bias is clearly indi- 
cated. It is not so clear, however, whether this 
optimum is at 5 or 10 volts, or at some intermedi- 
ate value. Analysis of variance showed that the 
effects of both illumination and CRT bias were 
highly significant (p= .001); for illumination, the 
effect was primarily due to lower visibility in the 
brightest room. There was no significant interac- 
tion between bias and lighting 


Experiment II 


Subjects. One female and four male Ss took part 
Again, two were service personnel experienced with 
radar. 

Method. As before, the design involved joint 
variation of four ambient light levels and four CRT 


40 


AMBIENT ILLUMINATION 
-——* DARK 
———9 0.01 FT C 
o-—-——e 0.1 FT C 
10FT C 


THRESHOLD VISIBILITY (DECIBELS) 








EE ———————— 


7 10 2 
VOLTS BIAS (FROM VRI) 


2. Target visibility and CRT bias 





Visibility on Radar Screens 





w 
@ 


w 
o 


w 
oy 


CRT BIAS (FROM VRI) 
o———@ § VOLTS 
7 VOLTS 
10 VOLTS 
12 VOLTS 





THRESHOLD VISIBILITY (DECIBELS) 


w 
N 
+ 





4 - 4 L 
0.01 Ol 10 
ILLUMINATION ~- FOOT-CANDLES 


“DARK" 
AMBIENT 


Fic. 3. Target visibility and ambient illumination 
biases. Ambient lighting remained at the previously 
used values; but the range of. CRT bias was nar 
rowed to 5, 7, 10, and 12 volts from VRI 

Results. Figures 2 and 3 give the data from this 
experiment. Figure 2 shows CRT bias against pip 
visibility; Fig. 3 is based on the same data, plotted 
as ambient illumination against visibility 

Analysis of variance again showed both bias and 
lighting to have significant effects (p = 001); there 
was also a slight and barely significant interaction 
between the main effects (p = .05) 

These data suggest that for optimal pip visibility, 
CRT bias should lie around 7 volts from VRI; and 
that room lighting may be as high as 0.1 footcandle 
without loss. Since, however, there is a considerable 
gap between this value and the succeeding one (1.0 
footcandle), a more precise upper limit for ambient 
lighting was desired 


Experiment ITI 


Subjects. Two female and four male Ss were 
used. Only one had had service experience with 
radar detection, but the remainder had taken part 
in the previous experiments of this series , 

Method. CRT fixed at 7 volts from 
VRI. Light ‘levels were varied through the three 
values of 0.1, 0.2, and 0.3 footcandle. Three ap 
pearance thresholds were determined by each S un- 


bias was 


der each condition. 

Results. Figure 4 illustrates the finding. There is 
a steady and apparently linear decrement in visibil 
ity from 0.1 to 0.3 This 
statistically reliable (p 


footcandle decrement is 


01) 


Discussion 


Williams has demonstrated (4) that, using 
an empirical function relating CRT bias to 
sweep-line brightness, radar visibility meas 
ures of this type are transformable into the 
more usual differential thresholds, such as 
those determined by Blackwell (1). Due to 
the lack of exact data on the bias-brightness 
function for our display, no comparable trans 
formation has been attempted here. The cor 
respondence between our data and those of 
Williams suggests that this would be suc- 
cessful, if attempted. Since, however, radar 
equipment is in fact calibrated in electrical 
units, such a transformation would have but 
little practical value. 

The match between our results and those 
of Williams, while good, is not exact. Wil 
liams found optimal bias at 5 volts from VRI 
Our optimum is more nearly a range of values, 
from 5 to 10 volts, with a suggestion that 7 
volts may be somewhat the best. However 
the relative instability of the thresholds from 
experiment to experiment—-as a comparison 
of Figs. 1 and 2 shows——suggests that too pre 
cise a statement of optimal bias is unwar 
ranted. In_ this our Experiment 2 
failed of its primary purpose 


sense, 


T 


s 


© 


(OECIBELS) 


THRESHOLD VISIBILITY 


33 


= ‘ 

0.1 02 
AMBIENT ILLUMINATION 

Fic. 4 


= 
o3 
FOOT-CANDOLES 
Decrease in target visibility at higher 
light levels 








18 A.A 


Three reasons can be advanced for the lack 
of complete correspondence between our and 
Williams’ results. We did, of course, use a 
larger display; our targets were different in 
size and shape from those used by Williams; 
and we did not use trained observers. (The 
previous radar experience of a few of our Ss 
does not, of course, constitute training in this 
regard.) 

The separate determination of optimal bias 
and room lighting by Williams implies an as- 
sumption that these factors are independent. 
While we did find, in Experiment II, a slight 
interaction between the main variables, this 
was so small that for practical purposes the 
assumption of independence is substantiated. 

Our data on ambient illumination show that 
a level of 0.1 footcandle is at least as favor- 
able as darkness. Williams and Hanes (5) 
reported a slight but nonsignificant advantage 
for light over darkness. This finding was not 
confirmed. 

In conclusion, then, we feel that the present 
experiments confirm the earlier findings in 
most essentials. There is an optimal CRT 
bias (or range of biases) at which radar tar- 


. Smith and G. E. Boyes 


gets are most visible; and ambient illumina- 
tion can be as high as 0.1 footcandle without 
loss. The decrement produced by working in 
darkness and at bias values close to VRI, as 
against optimal conditions, is of the order of 
10 db. of signal voltage. This can mean, ac- 


cording to Thornton’s calculations, a loss in 
radar range of up to 25 per cent. 


Received March 23, 1956 


References 


1. Blackwell, H. R. Contrast thresholds of the hu- 
man eye. J. opt. Soc. Amer., 1946, 36, 624- 
643. 

Thornton, G. B. Radar range performance as a 
function of CRT operating conditions. DRML 
Report No. 163-3, Project No. D77—94-20-22 
(H.R. No. 109), 1957. Pp. 9. 

Williams, S. B., & King, E. The effect of CRT 
bias on visibility of targets on a remote PPI 
Systems Research, ONR, SDC Report No. 
166-1-6, 1946. 

. Williams, S. B., Bartlett, N. R., & King, E. Visi- 
bility on cathode-ray tube screens: screen 
brightness. J. Psychol., 1948, 25, 455-466. 

Williams, S. B., & Hanes, R. M. Visibility on 
cathode-ray tube screens: intensity and color 
of ambient illumination. J. Psychol., 1949, 
27, 231-244. 





Journal of Applied Psychology 
Vol. 41, No. 1, 1957 


“Cloze” Readability Scores as Indices of Individual Differences in 
Comprehension and Aptitude ' 


Wilson L. Taylor 


Institute of Communications Research, University of Illinois 


“Cloze procedure” was introduced in 1953 
along with experimental evidence to support 
the conclusion that it is an effective and reli- 
able method of quantitatively pretesting and 
contrasting the relative “readabilities’—for 
defined populations of readers—of various 
samples of English prose (4). 

The present article reports an experiment 
aimed at testing the validity of cloze indices 
of readability by determining the degree to 
which the cloze scores of individual subjects 
correspond to independent measures of spe- 
cific knowledge and general aptitude. 

It was assumed at the outset that read- 
ability and comprehensibility are essentially 
synonymous terms. The statement that a 
particular prose passage is, for example, 
“very readable” simply seems to mean that 
it is very easy for some “average” reader to 
understand. 

As will shortly be made more specific, a 
prose sample’s cloze score depends on the in- 
dividual cloze scores of all subjects with re- 
gard to that sample. In turn, an individual’s 
cloze performance appears to depend heavily 
on how well he understands the meaning of 
the materials administered—hence on the fac- 
tors which affect comprehension, such as gen- 
eral language facility, specific knowledge and 
vocabulary relevant to the materials at hand, 
native ability to learn, attention, motivation, 
and so on. 

This experiment specifically tested the hy- 
pothesis that the cloze scores of individual 
subjects would correlate significantly with 
their performances on (a) carefully con- 

1 The research summarized here was supported by 
Contract AF18(600)335, 670-07, 52-24, between the 
Human Resources Research Institute, United States 
Air Force, and the Institute of Communications Re- 
search, University of Illinois. The author is in- 
debted to a number of persons for their assistance, 
suggestions, and encouragement. Among them are 
Charles E. Osgood, Wilbur Schramm, Charles E. 
Swanson, Kalmer E. Stordahl, and Clifford M. Chris- 


tensen. All were then on the staff of the Institute 
at Illinois. 


structed preknowledge and immediate-recall 
tests of the content of the material presented, 
and (b) a standardized aptitude or “intelli- 
gence” test of supposed “ability to under- 
stand.” 

Also, certain methodological problems were 
explored. They mainly concerned the crea- 
tion of the sampling points on which cloze 
scores are based. 


Cloze Procedure * 


The cloze technique differs radically from 
all of the “element-counting formulas” for 
estimating readability. It assumes that (a) 
the more readable a’ piece of writing is, the 
better understood it will be even if some 
words are left out, and (b) the better the 
writing is understood, the more likely it is 
that a reader can guess what 
missing. 


words are 

In operation, one begins by choosing sam- 
ple passages of equal length to be compared, 
then each passage is “mutilated” by deleting 
the same number of words from each. The 
deletion process selects words at random or 
counts out every nth one. The passages are 
reproduced with some standard size of blank 
in place of each missing word, and all muti- 
lated materials are administered to all sub- 
jects in the test group—or to separate and 
presumably similar groups drawn from the 
same population. The subjects are asked (a) 
to guess what the missing words are, and (b) 
to write their guesses in the corresponding 
blanks. 


For example, the average reader should have a 
good chance of guessing “time” and “little” in “Once 


*Cloze procedure derives its name from the “clo- 
sure” concept of Gestalt psychology. Just as there 
is an apparent human tendency to “see” a not-quite- 
complete circle as a whole circle—by “mentally clos 
ing the gap” and making the image conform to a 
familiar shape—so does it seem that humans try to 
complete a mutilated sentence by filling in those 
words that make the finished pattern of language 
symbols fit the apparent meaning. 





20 Wilson L. Taylor 


” 


upon a - there were three -——— pigs.... 
His success on a whole passage of this nature shoul 
be relatively great. But the same subject would be 
much less likely to anticipate “apprehend” and “im- 
port” in “When theory can ———— rational prob- 
lems of — to....” And his success on such a 
passage would be less. 


The score of any subject on any passage is 
the number of his proposed words that match 
the original ones deleted, and any passage’s 
score is the total of the scores all subjects 
make on it. The passage with the highest 
score is considered “most readable,” the one 
with the second-highest as “next-most read- 
able,” etc.—-pending the outcome of statisti- 
cal tests of the significance of the differences 
observed.® 

Quite obviously, the cloze method has little 
in common with standard readability for- 
mulas. It makes no assumptions about cor- 
relations between ease of comprehension and 
the frequencies of occurrence of such “ele- 
ments” as word or sentence length, familiar 
or different words, parts of speech, active 
voice, concrete terms, dependent clauses, etc. 
And it does not count them. The units that 
cloze procedure counts are successful acts of 
reproduction, 

This method seems innocent of three of the 
charges Lorge (1) has leveled against the for- 
mulas. He asserts that they do not take ac- 
count of meaning, the sequential nature of 
language expression, or the maturity levels of 
particular subjects or groups. Cloze pro- 
cedure evidently does and must. 

Cloze results apply directly only to the 
subjects and the materials employed, but the 
results may be generalized, of course, to the 
extent that larger defined populations of sub- 
jects and materials are adequately represented 
by the samples used, 


Past Findings and Reports 


Only a few scattered references (2, 6) to 
cloze procedure have appeared previously in 


8 In conformance with an opinion of Lee J. Cron- 
bach, cloze data are treated as “true scores’ in sta- 


tistical analysis. Cronbach stated that cloze results 
appear to satisfy the assumptions for scores, but not 
those for frequencies because successive blanks may 
not be independent. Semantic and structural factors 
tend to make for interdependencies among the words 
in a meaningful series such as an ordinary sentence. 


journals which ordinarily come to the atten- 
tion of psychologists, but reports of past ap- 
plications of the technique—particularly as 
an index of readability—have been published 
elsewhere in some detail (4, 5, 7,8, 9). 

An earlier summary of the research de- 
scribed in this article was the dittoed report 
to the supporting agency (3). And an oral 
version, ‘““Aptitude and Comprehension Cor- 
relates of ‘Cloze’ Readability Scores,’ was 
presented by the author in Division 15’s Pa- 
per Session III at the 1956 convention of the 
APA in Chicago. 


Criteria of Aptitude and Comprehension 


The experiment this article describes tested 
the notion that an individual’s cloze score, 
based on mutilated samples of a technical ar- 
ticle on the Air Force system of supply, 
would be a dependable index of (a) his men- 
tal ability, (6) how much of that article’s 
content he knew before studying it, and (c) 
how much he would know after study. 

Taken to be the criterion of “mental abil- 
ity’ was the individual’s performance on the 
AFQT (Armed Forces Qualification Test). 
AFQT scores, which were already available, 
were expected to correlate highly with cloze 
scores. 

Criteria for (6) and (c), respectively, were 
to be the scores the individual made on two 
matched multiple-choice tests developed out 
of the article. Other researchers had already 
constructed these tests by standard item- 
analysis methods; trials had shown the tests 
to be highly reliable and to yield similar 
means and variances when the study vari- 
able was held constant.‘ 

To distinguish them from cloze forms, both 
of these matched tests will be referred to 
hereafter as “comprehension” tests. What 
will be called the “pretest of comprehension” 
was used to determine how much of the ar- 
ticle’s content a subject already knew; the 
“immediate recall” test was given after study 

* The experimenter is particularly indebted to two 
persons, already named in footnote 1; they were 
responsible for the construction of the matched 
comprehension tests, without which the experiment 
reported here could not have been conducted. Re- 
ferred to are Christensen, now at the University of 


Arkansas, and Stordahl, at Arkansas Polytechnic 
College. 





“Cloze” Readability Scores 21 


of the article. The difference between the 
scores of an individual on the two tests was 
considered attributable to learning during 
study. 


Cloze Materials 


Cloze forms were based on only a 20% 
sample of the article, which was about 3,240 
words in length and hed been reprinted in 
folder form for experimental use. The article 
contained 360 lines of type, and eight sub- 
samples, each nine lines long, were mechani- 
cally selected. The first nine lines consti- 
tuted the first subsample, then 41 lines were 
skipped; the next nine lines became the sec- 
ond subsample, then 41 lines were skipped, 
and so on. The eight subsamples and the in- 
tervening skips totaled 359 lines, and the 
360th was discarded. 

The total sample thus extracted was muti- 
lated by three somewhat different methods to 
produce “any,” “easy,” and “hard” cloze 
forms. 


The purpose of this was methodo- 
logical—to explore a question asked by some 
critics of the introductory article on cloze 
procedure (4): “Wouldn't sampling points 


created by deleting only ‘important’ words, 
such as nouns or verbs, yield more discrimi- 
nating results than the practice of counting 
out words without regard for their differing 
functions?” 

To construct the “any” form, any and all 
words were considered equally liable to dele- 
tion. 

To devise “hard” and “easy” forms, all of 
the approximately 650 words in the sample 
were preclassified. This operation was based 
on the findings. of an exploratory analysis 
which utilized all previously collected cloze 
data and a set of “functional” categories of 
parts of speech. 


It was assumed that the use of a word in context 
should govern its category assignment. A word em- 
ployed as an adjective was classified as such even if 
its form was that of a noun. Likewise, “to” was 
considered a preposition when it took an object, but 
as a verb auxiliary in “to go.” 

By counting the successes and failures in past data, 
it was determined that adverbs, verbs, and nouns 
had been most frequently guessed wrong, hence were 
“hard.” Verb auxiliaries, conjunctions, pronouns, 


and articles had been most often guessed right, hence 
were “easy.” Adjectives (not including articles) 
and prepositions fell into a “medium” group 


All “hard” and “easy” words in the sample 
were identified. One mutilation operation 
took only the former into account, and an- 
other operation only the latter. “Medium” 
parts of speech were excluded from consid- 
eration. 

Except for the kinds of words each con- 
sidered “deletable,”’ 
tions were alike. A word was defined as any 
group of language symbols (letters, figures, 
and related punctuation) separated from 
neighboring groups by émpty spaces, and 
all punctuation not part of a deleted word 
was retained. Each operation utilized a 
“variable every nth” system of mechanically 
knocking out 10 words in every subsample 
hence the total number of blanks in (and the 
total individual 
every version was 80. 


all three deletion opera- 


possible cloze 


score for) 


The “variable every nth” system took account of 
the fact that some subsamples included more de 
letable words than others. The number of deletable 
words (whether “any,” “easy,” or “hard"’) in a sub 
sample was divided by 10, and the whole number 
of the quotient was taken as the value of mn. If, for 
example, 86 words were involved, that number was 
divided by 10 to yield 8.6, and every 
was deleted. In a few cases the total 
deletable “hard” or words in a subsample 
dropped below 20, hence the whole number of the 
quotient became one. Then, rather than just take 
the first 10 deletable words and leave the last half 
or third of the subsample free of blanks, the dis 
tribution of deletions was determined by reference 
table of random numbers. But two blanks 
were not allowed to occur in sequence; if the ran 
dom method 
one already 
arbitrarily 


eighth word 
number of 
“easy” 


to a 


word 
deleted, the 
selected instead 


chose a following 


word 


immediately 


next deletable was 


Two cover pages, each with a different ex- 
planatory foreword, also were prepared. One 
foreword was written for “before” cloze tests, 
all three forms, and the other for “after 
tests. All mutilated materials then were 
mimeographed and securely stapled to make 
“any,” “easy,” and “hard” assemblies. Each 
subject was to get exactly the same assem 
bly the second time as the first, except for 
the differing cover sheet. 








Wilson L. Taylor 


Subjects and Administration 


All subjects were Air Force trainees as- 
signed to three “training flights,” No. 2564, 
No. 2565, and No. 2566, at Sampson Air 
Base, New York. There were about 58 air- 
men in each flight, or 174 altogether. ° Of 
them, however, only 152 were represented in 
final data. Analysis considered only those 
subjects who (a) were present at both experi- 
mental sessions of their flight, (b) took and 
completed both “before” and “after” cloze 
tests and both comprehension tests (four 
tests altogether), and (c) had AFQT scores 
on record, 

Instructions for test administrators, pro- 
vided by the air base, required that al) three 
kinds of cloze forms with “before’’ forewords 
be randomly and about equally distributed 
among those subjects present at the first 
experimental session of each of the three 
flights.° 

All airmen were asked to enter their names 
and serial numbers in spaces provided, and 
these entries were used to make sure that (a) 
each subject received the same kind of form 
(with an “after” foreword) in the second ex- 
perimental session, and (6) his performances 
could be paired. 

Test administrators were instructed to read 
aloud the foreword relevant to each session 
at the beginning of that session, to answer 
questions regarding the directions subjects 
received, and to keep order while the forms 
were filled out. They also were told to say 
nothing to cause subjects to anticipate any 
experimental step still to come. 

All three flights held their first sessions on 
the same day; the second session for one 
flight came seven days later and those for 
the remaining two flights eight days later. 
Every first session consisted of two parts, 
each about 40 minutes long; the “before” 
cloze tests were administered, and the pretest 


®° The possibility of simply assigning one kind of 
form to each flight was avoided. It was impossible 
to know in advance whether each flight would con- 
tain about the same number of men. There was no 
way of knowing that the different flights would be 
equal in language ability, intelligence, and experience 
Further, possible differences in test administrators, 
times of day tests were given, etc., might have be- 
come contaminating variables. 


of comprehension followed. Every second 
session had three such parts: (a) Subjects 
were given copies of the complete supply ar- 
ticle to study, (b) the immediate recall test 
of comprehension was given, and (c) the 
not-previously-anticipated ‘after’ cloze as- 
semblies were administered. 


Screening and Scoring 


Before the cloze tests were scored, all data 
from 22 subjects were eliminated. 

Inspection and pairing by subject showed 10 had 
been absent for either the “before” or the “after” 
cloze test. Six more subjects—apparently because 
they were almost wholly illiterate in English— 
failed to complete the “before” test; they left the 
last quarter to half of the blanks unfilled. Two 
others either did not take or did not complete both 
of the comprehension tests. Finally, data on four 
other subjects were discarded because AFQT scores 
were not reported for them 


The remaining 152 subjects were found to 
be distributed fairly equally within both the 
different-flights and the kinds-of-assembly di- 
mensions. Three flights multiplied by three 
assemblies made for nine subgroups, and the 
Ns within the subgroups ranged only from 15 
to 18. 

Only those filled-in words which matched 
the original words were counted as right. 
The singular form of the correct word did 
not count for the plural, nor a common spell- 
ing for a technical one, nor an abbreviation 
for a written-out form. And no credit was 
given for synonyms. What graders consid- 
ered to be obviously careless misspellings of 
obviously right word forms were not penal- 
ized, however, and no attention was paid to 
capitalization or. the omission of internal 
punctuation. 


Results and Analyses ° 


1. Cloze vs. Comprehension: The first four 
lines of entries in Table 1 show highly signifi- 
cant (all Ps less than .001) and positive cor- 
relation coefficients between all paired dis- 
tributions of cloze and comprehension scores; 
the coefficients range from .51 to .92. Not 
only did neighboring-in-time test scores (‘‘be- 

® Throughout this report, all inferences about the 


significance levels of the findings are based on two- 
tailed tests. 





“Cloze” Readability Scores 


Table 1 


Product-Moment Correlation Coefficients Among Cloze, Comprehension, AFQT Scores 


Arrays Compared 


‘loze vs. Comprehension 
(Neighbor Tests) 
“Before” vs. Pretest 
“After” vs. Immediate Recall 
(Separated Tests) 
“Before’”’ vs. Immediate Recall 
“After” vs. Pretest 

‘ompared to AFQT 
Cloze “Before”’ 
Cloze “After” 


Comprehension Pretest 

Comprehension Immediate Recall 
Reliability 

Cloze: “Before’”’ vs. ‘‘After’”’ 


Comprehension 


Note All coefficients shown are positive; all exceed the \ 


alues (about .46 for “any 


Groups by Cloze Versions 
“Any” “Easy” “Hard” 


N = 48 N = §2 N = §2 


70* 58° 92* 
80 OA 80 


76t Sif 
78 


73% 


” 


88 BO ey 


Pretest vs. Immediate Recall 83 74 74 


‘and 44 for" y and “hard’’) needed 


to make them significantly different, to above the .001 level, from zero 
* 


he r of .92 significantly different from both 
t Significantly different to .05 level 
t Significantly different to .05 level 


fore” cloze with pretest of knowledge, “after’’ 
cloze with immediate recall) correlate signifi- 
cantly, but the scores on each kind of pre- 
liminary test dependably predicted scores 
made on the other kind of test separated 
from it by a seven- to eight-day interval and 
by study of the article. 

2. Correlations with AFQT: The middle 
section of Table 1 shows highly significant 
and positive coefficients (from .46 to .74) 
between AFQT indices and 2 & 3, or 6, varie- 
ties of cloze scores. The corresponding co- 
efficients between AFQT and comprehension 
scores range from .49 to .70. 

An individual’s AFQT score is a combina 
tion based on a number of indices, two of 
which are WK (word knowledge) and AR 
(arithmetical reasoning) and are reported in 
“Sta-9” scores. Additional analysis showed 
that “any” scores, “before” and “after,” re 
spectively, correlated .85 and .82 with WK 
and .70 and .76 with AR. Corresponding re 
sults for “easy” were .65 and .54, .59 and 
57; and for “hard,” .70 and .76, .51 and .61. 

3. Test-Retest Reliabilities: Entries at the 
bottom of Table 1 display test-retest reliabil- 


70 and .58 to .O1 level, 


ity coefficients for the three kinds of cloze 
forms that are uniformly large: .88, .80, 
and .84, respectively, for “any,” “easy,” 
and “hard.” Corresponding values for the 
matched comprehension tests are .83, .74, 
and .74. 

4. “After’”-Minus-“Before’ Measures of 
Learning: Entered in Table 2 are explora- 
tory findings relative to the notion that cloze 
forms, like matched comprehension _ tests, 
might be used to measure learning. Pre- 
sented are the means and standard deviations 
of each group’s performances and the differ 
ences, the mean gains in score, between paired 
performances (on each cloze form and on the 


comprehension tests) separated by the seven- 
to eight-day time interval and by study of 


textual 
tests were 


the original material on which all 
these based. All differences are 
associated with highly significant ¢ values 
(the Ps are less than .001). The ¢s asso- 
ciated with “any” and “hard” cloze forms 
appear to exceed those yielded by the com- 
prehension tests for the same groups 

5. Homogeneity of Experimental Groups 
As shown by entries at bottom of Table 2, 





24 Wilson L. Taylor 


Bartlett’s test for homogeneity of variance 
yielded no chi-square value which indicated 
significant heterogeneity among the three 
population samples. The groups have highly 
similar means and standard deviations on the 
comprehension and AFQT tests. 

6. “Any,” “Easy,” and “Hard” Compared: 
Although the correlation coefficients and the 
learning-gain ts associated with all three cloze 
forms were found to be highly significant in 
every instance, the ‘“easy’”’ form did, in gen- 
eral, yield results that differ from both “any” 
and “hard” results. Also, “any” and “hard” 
differed somewhat from each other. 

As expected, subjects were considerably 
more successful in guessing easy-to-replace 
words than less easy ones; the “easy’’ means 
in Table 2 are markedly larger than those for 
the other cloze forms. Also, the learning-gain 
difference for “easy” is significantly smaller 
(P less than .01) than the gain for either 
“any” or “hard.” And the “easy” coeffi- 
cients in Table 1 in no case equaled or ex- 
ceeded a corresponding “any” value and in 
only one instance (“before” cloze vs. AFQT) 
a “hard” one. 


Hard-to-replace words were associated with 
somewhat smaller means than were “any” 
words, but Table 2 shows that the two forms 
yielded learning gains that are virtually equal. 

Real differences between “any” and “hard”’ 
performances appear in Table 1’s exhibit of 
coefficients. In a single instance, “before” 
vs. pretest, “hard” correlated higher than 
“any”; the coefficient of .92 between “hard” 
scores and the pretest of knowledge was sig- 
nificantly larger (P less than .01) than 
“any’s” .70. 

In all other instances, however, “any” co- 
efficients equaled or exceeded “hard” ones. 
“Any” coefficients were more consistently 
large; the smallest was .70. In contrast, the 
“hard” list includes values of .63, .59, and 
46. Further, “any” correlated significantly 
higher than “hard” (P less than .05) in the 
comparison of “before” cloze scores with 


AFQT indices. 
Discussion 


1. Cloze Scores As Indices of Comprehen- 
sion and Aptitude: The findings of this ex- 
periment appear altogether consistent in their 


Table 2 


Means, Standard Deviations, Significance of Learning Gains 


(with results of homogeneity of variance tests) 


Cloze Data 
Cloze 
Group 
(N) Before 


“Any” Mean: 22.81 
(48) (SD): (10.35) (12.50) 
Diff. 8.46 
(t) (9.55 


After 
31.27 


Mean: 

(SD) (9.18) 
Diff. 4.96 
(t) (6.23) 


39.79 44.75 


(8.95) 


Mean 19.135 27.56 
(SD): (8.99) (10.61) 
Diff. : 8.42 


(t): (10.40) 


Homog. Var.; Ps of 
Diffs. Among: 30-.50 05-.10 


Note 
and correlated means, 


Comprehension Data 


Immediate 
Recall 


31.02 
(9.20) 
4.79 
(6.43) 


Pretest 


26.23 
(7.37) 


AFQT 
60.29 
(15.585) 


25.15 
(6.91) 


32.365 
(9.03) 


60.50 
(14.40) 
7.21 
(8.51) 


26.17 
(6.84) 


33.115 
(10.28) 
6.94 
(7.13) 


59.82 
(16.17) 


50-.70 50-.70 70-80 


This table summarizes raw data for each group of subjects and shows gains attributable to learning between paired 
All small-sample ¢ values significant to above .001 level of confidence 


Results of Bartlett's homogeneity 


of variance test rejected hypothesis that groups might be significantly heterogeneous 





“Cloze” Readability Scores 25 


support of the notion that “cloze’’ readability 
scores are valid indices of the comprehensi- 
bility of English prose—for the readers con- 
cerned. At least with the test group and 
materials employed, no contrary evidence ap- 
peared, and the supporting evidence is of sev- 
eral varieties. 

If the comprehension tests, adopted as cri- 
teria, really did index (a) knowledge of the 
article's content before it was read, (6) the 
relative amount of that content remembered 
immediately after study, (c) the increase in 
knowledge of the content brought about by 
study, and (d) general aptitude or ability to 
understand, in agreement with another cri- 
terion, the AFQT, then it appears that cloze 
scores did so too. Further, if the pretest of 
comprehension really predicted how individu- 
als would do on their “final examination,” 
the immediate recall test, it seems that “be- 
fore’ cloze performances also did. 

2. Operational Efficiency and Simplicity: 
Although cloze and comprehension tests were 
generally similar in the kinds of results they 
yielded, the two kinds of tests were very dif- 
ferent in the cost, effort, and time required 
for construction. The advantages seem to lie 
with cloze procedure in general, and with the 
“any” method of mutilation in particular. 

3. Interpretation of Differences Among 
Cloze Forms: Performances relative to the 
chance deletion of words by the “any,” 
“easy,” and “hard” methods not only indi- 
cate that “any” and “hard” yield more dis- 
criminating results than “easy,” but also that 
the “any” method is equal or superior to 
“hard” for all purposes except one—gauging 
preknowledge of technically worded content. 

4. For Readability Use, “Any” Forms 
Only: For the purpose of indexing individual 
differences among subjects with regard to 
specific text material, all three of the mutila- 
tion operations used appear to have produced 
adequate cloze forms. But for contrasting 
the relative difficulties of different materials, 
only the “any” method of mutilation seems 
justifiable. 

To restrict deletions to particular kinds of 
words is to ignore the fact that those kinds 
may not occur equally often in different ma- 
terials. That difference in frequency of oc- 


currence may itself be a readability factor; 
if so, its effect should be included in—not 
excluded from—the results. 


Summary 


This experiment tried to test the validity 
of “cloze procedure’ scores as indices of the 
relative “readabilities’ of English prose ma- 
terials. The cloze score of a passage depends 
on the individual scores of all subjects in a 
test group. In turn, the individual’s score 
depends greatly on how well he understands 
the materials administered, hence on such 
factors as language facility, pertinent specific 
knowledge, and native ability. 

It was hypothesized that the scores indi- 
viduals made on cloze forms based on a tech- 
nical article about the Air Force system 
of supply would correspond to independent 
measures of those individuals’ aptitude scores 
(on the AFQT) and their performances on 
carefully constructed and matched multiple- 
choice pre- and posttests of comprehension 


about that article’s content. The design was 


such as to explore certain other matters. 
Cloze forms were based on a 20% sample 


of the article. The sample was made up of 
eight mechanically selected subsamples, and 
10 words were counted out of every subsam- 
ple by each of three systematic mutilation 
operations to produce “any,” “easy,” and 
“hard” cloze test assemblies, each of which 
had 80 blanks in which subjects were to write 
the words they guessed were missing. The 
“any” operation considered all words equally 
liable to deletion. The other two operations 
counted out only “easy-to-replace” and “hard- 
to-replace” parts of speech, as empirically 
determined from past cloze data. 

Each kind of cloze assembly was assigned 
to a different one of three test 
about 
from 


groups, of 
58 subjects each, randomly selected 
three flights of Air Force trainees. 
Each group had two experimental sessions a 
week apart. In the first, a subject was ad- 
ministered a cloze form, then the pretest of 
knowledge; in the second he was given the 
article to study, then the immediate recall 
test, then another copy of the same cloze 
form. 





26 Wilson L. Taylor 


Only subjects who were present through- 
out both sessions, who completed all four 
tests, and who had AFQT scores on record 
were represented in final data. Analysis 
dealt with 48 subjects in the “any” group 
and 52 each in the “easy” and “hard” groups. 

For each of the three groups, correlation 
coefficients were computed between all 10 
possible pairings of five distributions of scores 
(before- and after-study cloze, pre- and post- 
tests of comprehension, and AFQT). All 30 
such coefficients were found to be positive 
and significant to beyond the .001 level of 
confidence. 

Also for each group, the mean differences 
in scores attributable to learning during 
study, between paired cloze tests and_be- 
tween the comprehension tests, were found 
significant to beyond the .001 level. 

Comparisons of the results for the three 
test groups indicated that, in general, the 
“any” cloze form, which is far the simplest 
to construct, yielded more stable, reliable, 
and discriminating results than did the “easy” 
and “hard” forms. “Any” coefficients, which 
ranged from .70 to .88, were larger than all 


corresponding “easy” ones, and they equaled 
or exceeded all except one “hard’’ coefficient 
(“before” cloze vs. pretest of knowledge). 
“Any” and “hard” yielded equally significant 


learning gains, ones somewhat larger than the 
corresponding comprehension tests did. 
Received March 28, 1956 


References 

1. Lorge, I. Readability formulae—an evaluation 
Elem. English, 1949, 26, 86-95 

Osgood, C. E., & Sebeok, T. A. (Eds.) Psycho 
linguistics: a survey of theory and research 
problems. J. abnorm. soc. Psychol. (Suppl.), 
1954, 49, No. 4, Part 2 

Taylor, W. L. The cloze procedure: how it pre- 
dicts comprehension and intelligence of mili- 
tary personnel. Urbana, Ill.: Div. of Com- 
munications, Univer. of Illinois, 1953. (USAF, 
HRRI Tech. Memo. No. 13.) 

Taylor, W. L. “Cloze procedure”: 
for measuring readability. 
1953, 30, 415-433. 

Taylor, W. L. KM readers lend hand to science; 
‘cloze’ method works in written Korean and 
may serve as a tool for Korean language re 
form. Korean Messenger, 1954, 3, 5-4. 

Taylor, W. L. Application of ‘cloze’ and entropy 
measures to the study of contextual constraint 
in samples of continuous prose. Unpublished 
doctor’s dissertation, Univer. of Illinois, 1954 
(Dissertation Abstracts, 1955, 15, 464-465.) 

Taylor, W. L. Recent developments in the use of 
“cloze procedure.” Journalism Quart., 1956, 
33, 42-46, 99 

Taylor, W. L 
1956, 2, 1-4 

Taylor, W. L. Readability research. (In Japa 
nese; Shin-ichi Ito, Trans.) Shinbun Kenkyu, 
1956, No. 57 (April), 16-20, No. 59 (June), 
11-14, No. 61 (August), 27-31. 


a new tool 
Journalism Quart., 


Cloze procedure. Agrisearch, 





Journal of Applied 


Psychology 
Vol. 41, No. 1, 1957 


Attitudes of White and Negro High School Students in a 
West Texas Town Toward School Integration 


Herbert Greenberg, Arthur L. Chase, and Thomas M. Cannon, Jr.’ 


Texas Technological College 


With the United States Supreme Court de- 
cision in May, 1954, ruling segregation in 
public schools unconstitutional, those areas 
of the country where segregation has been 
the traditional practice are for the first time 
faced with the immediate problems of inte- 
gration. Those most intimately concerned 
with the effects of the ruling—the educators, 
school administrators, parents of school chil- 
dren, and of course the students themselves 
must not only face the problems that may 
arise in the integrated situation, but must at- 
tempt to anticipate the nature of such prob- 
lems in order to be able to take more intelli- 
gent action in coping with them. 

Many studies have been made using tests 
which measure attitudes toward other races. 
The test devised by Marks (15) and those 
employed by Sims and Patrick (23) and 
Mayo and Kinzen (17) are examples of such 
devices. The work done by Gray and 
Thompson (10) and Prothro (19) are typi- 
cal of those studies which have attempted 
the measurement of attitudes toward several 
groups rather than focusing only on one. In 
a university summer school workshop on 
school integration (5), a group of educators 
was asked the question, “What is the single 
foremost problem of school integration that 
you think that you will face?” <A majority 
of these educators felt that outstanding among 
the problems would be those arising from 
student attitudes at the initiation of integra- 
tion. There has, however, been little work 
done in the area of measuring and evaluating 
the attitudes of the Negro and white students 
themselves. 


Purpose 


It is the purpose of the present study to try to 
ascertain student attitudes toward the various situa- 
tions in which they will find themselves as a result 


1 Cooperating researchers were: Carolyn Dennis, 
Ernestine Dobbins, Eleanor Miller, Georgia Smith, 
Doyle Taylor, John Taylor. 


27 


of integration. A secondary concern is to attempt 
to determine whether negative attitudes toward inte 
gration would significantly with authori 
tarian attitudes as measured by the F scale 

Population. Subjects were students of two segre 
gated public high schools in a West Texas town of 
approximately 25,000 population 


correlate 


Subjects were divided into the following four sub 
groups 

A, 114 white 
C, 26 Negro 
Negro freshmen 


seniors; B, 119 white sophomores; 


seniors, juniors, sophomores; D, 23 

The total Negro high school group and total white 
senior group were tested, while approximately 60 per 
cent of the white sophomores, picked at random, 
were employed 

Procedure The test battery included: 1. The 
California F Scale. (Referred to as the “F” scale) ; 
2. The Integration Attitude Scale. (Referred to as 
the “IA” scale.) 

The IA a 29-item questionnaire developed 
by the researchers, to atti 
tudes toward specific areas involved in school inte 
gration 
for the IA and the F scales were employed with high 
denoting attitudes 
and high authoritarianism respectively. The F and 
IA scales from a maximum 
democratic or prointegration score of 29 to a maxi 
mum authoritarian and anti-integration score of 203 
The median score is 116 

Due to the nature of the IA scale, it was deemed 
important that no indication of either pro or con 
attitudes toward the part of the 
researchers be assumed from the direction of word 
ing of scale items. Therefore, the IA scale was de 
signed with positively worded statements alternating 
with negatively 


scale, 


was designed measure 


The same answering and scoring methods 


scores unfavorable desegregation 


are scored on a scale 


desegregation on 


worded statements, ie 
Question 1. If another 
my school, I would do my 


race was integrated into 
best to accept them as 
classmates and equals 

think the level of my 
school would fall if other races were integrated into 
the school program 


Question 2. I scholastic 


Thus, in order to conform to the scoring method 
of the F scale, the algebraic signs of the answers to 
the positively worded statements were changed prior 
to tabulation 

Means and standard deviations were computed for 
all subgroups for both tests. 7 ratios were com 
puted determine significance of differences be 
tween means of the comparison subgroups for each 
scale. Analyses were made of selected items on the 


to 





28 H. Greenberg, A. L. Chase, and T. M. Cannon, Jr. 


IA scale to determine specific area responses and 
item interrelation. 


Standardized administration was employed. The 
anonymity of subjects was assured by instruction to 
omit names from the scales 


Results 


Table 1 shows means and standard devia- 
tions for subgroups, and ¢ tests between desig- 
nated comparison subgroups. It should be 
noted that, for either scale, there are no sig- 
nificant differences between classes within 
racial groups. Significant differences do exist 
between the Negro-white groups for both 
scales. It is interesting to note further that, 
though the Negro group is significantly more 
authoritarian than the white, it is  signifi- 
cantly more positive toward integration than 
the white group. 

Table 2 shows rank-order correlations be- 
tween the two scales for the four sample sub- 
groups. It can be seen that the correlation 
between the scales is very low, the only sta- 
tistically significant one (1% level) being the 
+ .32 for the white seniors. 

Table 3 presents the 29 items of the IA 
scale, with the approximate percentage of 
each subgroup responding with positive inte- 
gration attitudes toward each question. The 
first fact to be noted is that more than 80% 
of all white students and 100% of the Ne- 
gro group would do their best to accept 
school integration. A further scan of the 
table indicates a progressively lower percent- 
age of positive attitudes as the degree of per- 


Table 1 
Means, Standard Deviations, and T Scores 
Between Groups * 


Mean SD Mean SD t test t test 
K F IA IA F IA 


Group Scale Scale Scale Scale Scale Scale 


A 130.6 
Kk 134.7 
Cc 155.7 
D 156.5 

A-C 

B-D 

A-B 

C-D 


19.31 
19.89 
17.36 
14.76 


104.6 
102.9 
698 
67.7 


36.84 

38.60 

20.02 

12.39 
6.46** 
8.30** 
0.003 
0.004 


6.49** 
6.55** 
1.59 

0.001 
*A: White Seniors; B: White Sophomores; © 


Seniors, Juniors, Sophomores; D: Negro Freshmen 
** Significant beyond the .01 level 


Negro 


Table 2 
Rank-Order Correlations Between Scales 
(F and IA) for Each Subgroup 


Rank-order 


Group Correlation 


A + .32 
B +.18 
. —17 
D — 16 


sonal-social contact becomes closer. Thus, a 
large majority would accept members of the 
other race in school (Item 1), in church 
(Item 17), in band, choir (Item 5), and 
athletic teams (Item 11), and even as repre- 
sentatives of their school in interscholastic 
functions (Item 10). A smaller majority do 
not favor segregation in classrooms and ad- 
visories (Item 4). The majority decreases 
further when discussing integration in cafe- 
terias (Item 6), and becomes a slight mi- 
nority among white students when shower- 
room, rest-room (Item 16), etc., integration 
is discussed. A considerable majority of the 
white students would not accept a member of 
the other race as a best friend (Item 15), 
while a still smaller minority of the whites 
would dance with (Item 25), favor parties 
with (Item 12), or double-date with (Item 
21) members of the other race. These find- 
ings appear to substantiate the work which 
has been done with the Bogardus Social Dis- 
tance Scale (20). 

It can also be seen that more than half the 
white students would be opposed to being in- 
structed by teachers of another race (Item 7). 
An even larger number of whites and about 
half the Negroes consider dating between 
races (Item 20) to be a potential problem 
soon after integration. 

Lastly, an interesting result might be found 
in Item 22. Of the white students 73% of 
the seniors and 66% of the sophomores did 
not think their race was superior; while 
among the Negro group 85% of the seniors 
and 86% of the underclassmen did feel that 
their race was superior and should be ac- 
cepted as such regardless of what anyone else 
said. 

Table 4 presents the product-moment cor- 





Attitudes of Students Toward School Integration 


IA Scale: Form 1 


AGE___. CLASSIFICATION________>_ )0S@SEX—— APPROXIMATE GRADE AVERAGE 


This questionnaire has been devised to measure your attitudes. There are no “right” answers and no “wrong” 
answers—the only right answer is the one which best reflects your true personal opinion toward the question 
considered. 

To answer questions, choose the answer below which corresponds most closely with your personal attitude 
toward the particular question, and place the corresponding number in the space provided at left 


+ (Plus) 3 for strongly agree - (Minus) 3 for strongly disagree 
+ (Plus) 2 for agree — (Minus) 2 for disagree 
+ (Plus) 1 for mildly agree - (Minus) 1 for mildly disagree 


. If another race was integrated into my school, I would do my best to accept them as classmates and 
equals. 
I think the scholastic level of my school would fall if other races were integrated into the school program 
I would be willing to accept, as an equal, a member of another race into a club to which I belonged 
I believe that members of the other race should have separate advisories and separate seats in assemblies. 
I believe that any student who has the ability should be eligible for the band and/or choir regardless 
of his race. 
Racial groups should sit at separate tables in the cafeteria 
It would make no difference to me if my teachers were of my own race or a different one 
I would hesitate to bring students of another race home with me because I do not think my parents 
would approve. 
Every student should have equal rights in regard to holding a class office, position as cheerleader, etc., 
regardless of his race. 


I would not approve of a student of another race representing my school at statewide functions (Boy's 
State, Hi-Y conventions, etc.). 

I believe that every student, regardless of race, should be eligible for school athletic teams, if he has 
the ability to make the team. 

Different racial groups mixing at school functions (dances, parties, etc.) will not be wise—it will only 
result in fights and ill feeling between races. 


Members of any race should be allowed to sit anywhere on busses, in movies, at ball games, et« 


; 
Having members of other races on my school’s athletic teams would result in more “dirty playing” and 
unsportsmanlike conduct 
I believe that a member of the other race could become a very close friend of mine (possibly even my 
“best friend’’) 

. When integration is accomplished, separate shower facilities and locker rooms should be provided for 
the different races in Physical Education classes 
I would not mind having a member of another race as a member of my church 


I do not think that my parents would want to work on school parent committees, such as the PTA, with 
parents of another race 


If I liked a person of the other race well enough, I would accept him into my personal group of good 

friends (“My gang,” etc.): 

I believe that dating between races will be a serious problem soon after integration 

I would not mind ‘‘double dating” with a couple both of whom were of the other race 

Regardless of what anyone else says, I believe that my race is superior, and should be accepted as such 
. The Supreme Court’s decision to integrate other races into white schools was just and timely 

I do not think I would be willing to sit next to a member of another race in class 

I would not mind dancing with a member of another race at a school or club function 

Separate rest room facilities and drinking fountains should be provided for each racial group 
. There is no basic reason for feeling prejudiced against another race 

I would not vote for any candidate for student office unless he (she) was of my race 

Restaurants, movies, etc., should serve anyone, regardless of race 





30 H. Greenberg, A. L. Chase, and T. M. Cannon, Jr. 


relations between several questions from the 
IA scale. As might be expected, there ap- 
pears to be a definite correlation between 
those favoring segregation in advisories and 
those who favor it in school cafeterias (Items 
4 and 6). Also, there is some correlation be- 
tween those who would dance with a member 
of another race and those who would be 
willing to select a member of another race as 
a good friend (Items 15 and 25). Another 
result appears to be the lack of, and even 
negative, correlations between Items 1—8, and 
8-15. In other words, many of those stu- 
dents who would do their best to accept inte- 
gration (Item 1) and who would choose a 
member of another race as a good friend 
(Item 15) still would not bring these inte- 
grated students, even good friends, home to 
meet their parents due to the fear of parental 
disapproval (Item 8). This apparent con- 
flict between student and parental attitudes 
is further indicated by the low correlation 
found between the students’ attitude toward 
segregation in the cafeteria (Item 6) and 
their willingness to bring students of another 
race home (Item 8). 


Conclusions 


1. Authoritarian attitudes of high school 
students sampled in this study were not in- 


Table 3 
Subgroup Responses Favorable Toward Integration 
on [A-Scale Items 


Subgroups Subgroups 


B C D bm AEC DPD 


85 100 16 46 80 78 
50 82 17 84 8&1 

65 100 18 SO 56 84 8&5 

89 19 

100 20 29 31 46 —~=«5O 

85 21. +22 aS 6 9S 

92 22 73 66 15 14 

64 23. 00 75 

24 71 84 78 

71 -» 21 2 85 

45 43 88 78 

o4 27. 72 «75 «88 «85 

28 75 75 88 85 

89 29 «63 96 96 


Table 4 


Interitem Product-Moment Correlations 


Groups 


Items — - 
Correlated A B c D 


+.23 
+31 
+.47 
+.04 
+.55 


—.15 
+.77 
+ .26 
—.37 
+ 34, 


1-8 — 65 
46 + .68 
6-8 +.29 
8-15 — 08 
15-25 +39 


—.18 
+ .66 
+.47 
+.03 
+.08 


dicative of negative attitudes toward inte- 
gration. 

2. Negro students in the segregated school 
systems under study show highly authori- 
tarian attitudes as well as strong positive 
attitudes toward all areas of school integra- 
tion. 

3. White students in the segregated school 
systems studied show high authoritarianism 
though less than Negro students. 

4. White students show a number of posi- 
tive attitude responses toward many aspects 
of school integration, thus easing the ex- 
pressed fear of widespread interracial conflicts 
in integrated schools in this area, though 
problems may arise in situations necessitat- 
ing close personal-social contact. 

5. A hypothesis might be posed from the 
correlations between Items 1-8 and 8~—15 
(Table 4) that there may be a difference in 
attitudes toward school integration between 
students and their parents. This hypothesis 
is suggested as being a fertile field for fur- 
ther study. 


Received March 22, 1956. 


References 


. Adams, Edward L., Jr 
to Minority Groups 
21, 328-338. 

. Adorno, T. W., Frenkel-Brunswik, Else, Levin- 
son, D. J., & Sanford, R. N. The authori- 
tarian personality. New York: Harper, 1950 

. Chase, W. P. Attitude of North Carolina col 
lege students toward the Negro. Psychol. 
Bull, 1939, 36, 617. 

. Christie, R., & Garcia, J. Subcultural variation 
in authoritarian personality. J. abnorm. soc. 
Psychol., 1951, 46, 457-469. 

. Cook, Paul. Problems of school integration. J. 
Negro Educ., 1954, 23, 438-486. 


Attitudes with regard 
J. Educ. Sociol., 1948, 





Attitudes of Students Toward School Integration 31 


. Dawkins, O. C. Kentucky outgrows segregation 

Survey, 1950, 86, 358-359. 

. Dollard, J. Caste and class in a southern town 
New York: Harper, 1949. 

Foreman, C. Environment in Negro elementary 
education. New York: Norton, 1932. 

. Frazier, E. F. Negro youth at the crossways 

Washington, D. C.: American Council on 

Education, 1944. 

. Gray, J. S., & Thompson, A. The ethnic preju- 

dices of white and Negro college students 

J. abnorm. soc. Psychol., 1953, 48, 311-313 

. Himelhoch, J. Tolerance and personality needs 

Amer. sociol, Rev., 1950, 34, 415-423. 

. Holbrook, S. A study of some relationships be- 

tween Negro and white students in New York 

Public Schools. High Points, 1944, 26, No 

6, 5-17. 

. Konvitz, M. R. Court deals a blow to segrega- 
tion: the separate but equal doctrine begins 
to crumble. Common Ground, 1951, 11, 148- 
168. 

MacKenzie, B. K. Importance of contact in de- 
termining attitudes toward Negroes. J. ab- 
norm. soc. Psychol., 1948, 43, 417-441. 

. Marks, E. S. Standardization of a race attitude 
test for Negro youth. J. soc. Psychol., 1943, 
18, 245-247. 

Maslow, A. H The authoritarian character 
structure. J. soc. Psychol., 1943, 18, 401-411 


17. Mayo, D., & Kinzen, J. R. A comparison of the 


“racial” attitudes of white and Negro high 
school students in 1940 and 1948. J. Psy 
chol., 1950, 29, 397-405 

Myers, C. M.S. A study of anti-Negro preju 
dice. J. Negro Educ., 1943, 13, 709-714. 


. Prothro, E. T. Ethnocentrism and anti-Negro 


attitudes in the deep south. J. abnorm. so« 
Psychol., 1952, 47, 105-108 

Prothro, E. T., & Miles, Otha K. Social distance 
in the deep south as measured by a revised 
Bovardus Scale. J. soc. Psychol., 1953, 37, 
171-174 

Pugh, R. W. Comparative study of adjustment 
of Negro students in mixed and _ separate 
schools. J. Negro Educ., 1943, 12, 607-616 

Scodel, A., & Mussen, P. Social perception of 
authoritarian and nonauthoritarian. J. Psy- 
chol., 1953, 48, 181-184 

Sims, V. F., & Patrick, J. R. Attitudes toward 
Negroes in northern and southern colleges. J 
soc. Psychol., 1936, 7, 192-204 

Spellman, C. L. Notes on integrating the Negro 
minority. Social Forces, 1946, 25, 217-218 

Taylor, T. H Intergroup relations at cosmo- 
politan junior high. J. Educ. Sociol., 1947, 
21, 220-225 

Zeligs, Rose. Children’s intergroup attitudes. J 
Gen. Psychol., 1948, 12, 101-110 








Journal of Applied Psychology 
Vol. 41, No. 1, 1957 


Some Factors Influencing Income Aspiration 


H. C. Ganguli 
Indian Institute of Technology, Khargpur 


Pay is an important factor in job satisfac- 
tion. In a number of studies here (1, 4) 
workers have rated income as the most im- 
portant item in a list covering different as- 
pects of the job. Satisfaction with pay itself 
depends on, among other things, size of actual 
income, wage differentials, income aspiration 
and method of payment. In a previous com- 
munication to the Indian Journal of Psy- 
chology the relation of method of wage pay- 
ment to industrial morale has been discussed. 
In the present article some factors related to 
income aspiration are brought out. 

This problem of the level of financial as- 
piration has been previously discussed in two 
American studies (2, 5) available to the au- 
thor. Centers and Cantril deal with certain 
aspects of income satisfaction and variations 
in income expectation in a sample of 1,165 
persons representing a cross-section of the 
American adult population. They note the 
differences in income satisfaction under dif- 
ferent conditions and factors that may be 
influencing income expectation. Thomsen 
studied the level of future financial expecta- 
tion of American college students. He con- 
cludes with the observation that paranoia 
and paranoid trends generally develop from 
an “expectation-achievement discrepancy.” 


The Present Study 


The results presented here form part of an 
attitude study made in two light engineering 
factories in Calcutta manufacturing sewing 
machines and electric fans. The detailed 
analysis reported here has been done on a 
sample of 534 workers in Factory A (called 
Group A), but the broad conclusions are sup- 
ported by findings from Group B consisting 
of 269 workers in another factory. Each 
factory has.a total strength of about 1,850 
workers and the samples were randomly se- 
lected. The workers studied were machine 
operators (e.g., lathe, drill, capstan) and 
craftsmen (e.g., fitters, carpenters). Regard- 


ing age, education, etc., the two groups were 
not very different. 

The study involved deep interviews with 
individual workers lasting, on the average, 
for more than an hour and a quarter, and on 
the basis of a previously standardized scale. 
During the interview each worker was also 
asked to mention the monthly income to 
which he aspired. It was found that most 
of the respondents based their expectations 
on the money they felt to be necessary to 
maintain their family comfortably. How- 
ever, some workers mentioned their estimate 
of the worth of the work they do and the 
returns they feel they deserve from it as 
grounds for their aspirations. 

Table 1 gives a few characteristics of the 
two working groups. 


Results 


Table 2 gives the distribution of income 
expectations of these two groups separately. 
It is seen that 60% of workers in Group A 
aspire for an income of Rs. 125 or less per 
month whereas in Group B the 60% limit is 
Rs. 175. 

To determine the extent to which income 
expectation is influenced by other variables, 
correlations have been calculated between ex- 
pected income and the three variables—age 
of the worker, his length of service, and total 
monthly earnings—in each case after con- 
trolling the influence of the other two vari- 
ables. Expected income is significantly cor- 
related (.01 level) only with length of serv- 
ice and total income. The significant partial 
correlations are: with length of service .21 
(Group A) and .16 (Group B), and with 
total earnings .53 (Group A) and .42 (Group 
B). The net influence of age on income as- 
piration is not substantial, the partial corre- 
lations being only .07 (Group A) and .09 
(Group B). ? 

Mean income aspirations of workers in dif- 
ferent earning and service groups have been 





Factors Influencing Income Aspiration 33 
Table 1 
Some Characteristics in Terms of Means and Standard Deviations of the Two Industrial Groups Studied 
Mean 
Total Percentage Monthly Percent Increase 
Monthly Satisfied Income Wanting Wantedasa 
Age Service Earning with Expec- Increase Percentage 
(in yrs.) (in yrs.) in Rs.* Income tation inIncome of Income 
Group A 31.5 8.0 85.1 1 127.2 95 0.3 
+8.7 +4.7 +314 +37.7 
Group B 27.7 6.6 126.2 9 170.8 94 45.5 
+6.2 +34.9 +39.3 


+3.6 


* One Rupee is equivalent to 21 cents. 


calculated. These values again bring out the 
influence of these two variables. For exam- 
ple, in Group A workers earning less than 
Rs. 50 a month, mean income aspirations are 
Rs. 83, 109, 119, and 95 in the four service 
groups of 3 years or less, 3.1 to 5 years, 5.1 
to 9 years, and above 9 years. Again, work- 
ers with more than 9 years’ service, for ex- 
ample, have increasingly higher monthly as- 
pirations—Rs. 95, 114, 137, 174, 203, and 
233 as their income increases gradually from 
Rs. 50 to nearly Rs. 200. 

In Factory B, the total earning of the 
worker was made up of a fixed daily wage 
plus a bonus calculated upon his weekly pro- 
duction. This bonus was paid on production 
above a fixed target and although the rates 
were not determined on the basis of a strict 
time study of the job, these were liberal 
enough to allow the worker substantial earn- 
ings. It was found, however, that the work- 
er’s income expectation is related more closely 
to his fixed daily wage than to his bonus or 
total earning. The correlations between ex- 
pected income on the one hand and daily 
wage, production bonus, and total earning on 
the other, controlling on the influence of 
length of service, are .55, .35, and .41, re- 
spectively. The reason seems to be that 
daily wage is more stable, forms the basis 
for the calculation of “dearness” allowance 
(i.e., allowance for balancing the increase in 
the cost of living), welfare fund, etc., and 
also reflects more closely the seniority and 
relative standing of the worker. 

Relation between income aspiration and 
education of the subject has been analyzed. 


The workers were divided into four classes 
according to the schooling received: (a) illit- 
erate workers; () workers educated up to 
Standard VI; (c) those educated up to Stand- 
ard VIII, and (d) those with a high school 
education. Table 3 gives the income expec- 


tations of different educational groups. It 
also gives their mean group expectations. 
From Table 3 it seems that there is a 


noticeable tendency, especially in the low 
wage groups, for the better educated worker 
to expect higher financial returns. The co- 
efficient of contingency between the four edu- 
cational groups and income expectation comes 
to .25 and since 1/N is approximately .04, 
this positive contingency value seems to be 
significant. Not much difference in income 
aspiration is evident, however, between the 
second and third educational groups. There- 


Table 2 
Income Aspiration of Two Industrial Groups 
Monthly 


Income 
Expectation 


Per Cent Choosing 


(in Rs.) Group A Group B 
50-75 0.9 

76-100 38.7 5.2 
101-125 21.9 10.5 
126-150 23.6 33.0 
151-175 49 13.4 
176-200 8.0 27.7 
201-225 0.2 3.0 
226-250 19 6A 
251-275 04 
276-300 0.2 OA 








H. C. Ganguli 


Table 3 


Mean Income Expectation of Differently Educated Working Groups with Different Income Groups, 


Treated Separately (Group A) 





Mean Income Aspiration (in Rs.) 





“Workers 
with 

Income of 

Education Level N 
151 
267 
61 
55 


103 
107 
113 
119 


Illiterate 

Standard I-VI 
Standard VII-VIII 
High school 


Rs. 51-80 Ks. 81-110 Rs. 111-140 Rs. 141-170 





Workers 
with 
Income of 


Workers 
with 
Income of 


Workers 
with 
Income of Group 
Expectation 
115 
131 
137 
148 


138 
184 
154 
267 


178 
163 
155 
173 


131 
145 
146 
153 


* Income groups with inadequate number of cases in them have not been shown. 


fore, income expectations seem to contrast 
more among the three groups of workers— 
those without any education, those with mod- 
erate education, and those with high school 
education. And to some extent, the higher 
the education, the higher the income aspira- 
tion of the worker. 

According to nature of work done, workers 
were Classified as machinists, craftsmen and 
helpers, highly skilled, skilled, and semi- 
skilled. No sharp pattern in variation in in- 
come aspirations of different job groups is 
noticeable except for a slight general trend 
on the part of machinists, craftsmen, and 
helpers to have gradually decreasing expecta- 
tions (Table 4). This is seen more promi- 
nently for semiskilled craftsmen and helpers 


who usually have lower aspirations than ma- 
chinists or skilled craftsmen, even when they 
are earning about the same. 


Discussion and Conclusion 


The question that factory management fre- 
quently asks is which level of income the 
workers consider suitable and that would 
satisfy them. Indian studies have shown 
that income is considered to be the most im- 
portant aspect of employment and that a sub- 
stantial proportion of workers in each income 
group is dissatisfied. But what this general 
level of financial aspiration is and what the 
factors are that go into determining this have 
not been systematically studied. 


Table 4 


Variation in Income Aspiration with Nature of Work Done in Groups A and B 
(Different Income Groups Treated Separately *) 


Income Aspiration (in Rs.) 


Workers 
with Income 
Rs. 51-80 


Workers 
with Income 
Rs. 81-110 


Workers 
with Income 


Mean Group 
Rs. 1114 


Mean Group 
Expectation 


Earning 
A B A A B A B A B 
118 144 145 
123 153 
117 144 
109 127 
110 116 


Skilled machinists 
Semiskilled machinists 
Highly skilled craftsmen 
Skilled craftsmen 
Semiskilled craftsmen 
Skilled and semi- 
skilled helpers 102 
Manual helpers 91 


175 
176 
192 
167 
165 


141 
150 
123 114 
91 


123 


* Income group with low fre juencies not shown 





Factors Influencing Income Aspiration 


Table 5 


Income Satisfaction and Income Expectation of Workers with Different Incomes (Group A) 





% 
Dis- 
satisfied 


100 


Income Groups N 


. 50 and less * 15 : 


. 51-80 291 98 
. 81-110 125 91 
. 111-140 68 94 
. 141-170 22 73 
Rs. 171+- 13 69 


in Income 


Mean 
Increase 
Wanted in 
% of Total 
Earnings 


No. Mean 
Wanting Income 
Increase Aspira- 
tion 


Mean 
Increase 
Wanted 
(in Rs.) 

15 106 63 151 
290 108 54 68 
121 50 55 

61 48 48 

18 34 23 
26 14 





During an attitude survey in two Calcutta 
engineering factories this problem was studied. 
A number of points have come out. With 
reference to income expectation, discussions 
have shown that a majority of workers think 
in terms of comfortable living for their 
family. This may be because most incomes 
do not allow such a standard of living. 
Others think of their expectations in terms 
of an adequate return for their contribution 
to the job. An analysis of factors that de- 
termine the level of financial aspiration sup- 
ports the statement of Centers and Cantril 
that, “For those who are dissatisfied, it is 
generally true that the more money a person 
has the more money he wants.” The correla- 
tion between present earnings and income ex- 
pectation is always above .4. In other words, 
a person’s present earnings serves as a frame 
of reference by which he sets his aspirations. 
There is a saturation level, however, since in 
the higher income groups an increasing per- 
centage of workers do not desire or expect any 
further increase in income (Table 5). Fur- 
ther, although with increasing income abso- 
lute expectations go up, the aspiration in re- 
lation to the present earning decreases. For 
example, Table 5 shows that whereas the 
mean absolute expectation goes up from Rs. 
106 to Rs. 183 as the income increases, the 
increase in income desired either in rupees or 
as a percentage of total income actually 
diminishes. 

The other important influence on level of 
aspiration is the period for which the person 
is employed in the factory. Expectation in- 
creases with increase in length of service. A 


senior person expects higher wages than a 
comparatively new employee. The multiple 
correlation between expected income and to- 
tal earning and period of service is: 

Level of 
R Significance 
618 Ol 
539 Ol 


Group A 
Group B 


Incidentally, the age of the worker is defi- 
nitely a less important factor than his period 
of service in determining his level of income 
aspiration. Actually, the present study shows 
that age is not a significant factor in the de- 
termination of income aspiration. 

It has also been found that a_ person’s 
financial aspiration depends to some extent 
on his education. A person with a high 
school education expects more money than a 
person educated up to Standard VIII, and 
the latter, in turn, has a higher aspiration 
than an illiterate or just-literate worker. Al- 
though it is quite possible that aspirations 
depend to some extent on nature of work 
done, no substantial difference has been found 
between aspiration of skilled machine opera- 
tives and craftsmen. Semiskilled craftsmen 
and helpers, however, aspire to an income 
level generally lower than that of skilled 
craftsmen and machinists. 

In conclusion, it must be pointed out that 
income expectation has been found to be a 
very important factor related to the job 
satisfaction of the worker. For workers in 
Factory B, for example, over-all morale 
scores were determined on the basis of 12 
highly discriminating statements belonging to 








H. C. Ganguli 


Table 6 


Percentage of High Morale and Low Morale Workers 
in Different Income Aspiration Groups 
(Factory B) 


High 
Morale 
Workers 
54.6 
42.1 


Low 
Morale 
Workers 


Expected 
Income 


Total 


454 100 
57.9 100 


Rs. 180 and less 
Above Rs. 180 


x? = 9.58 


different attitudinal areas like satisfaction 
with nature of work done, wages, supervision, 
and company policies and practices (3). The 
group was divided along the median morale 
score into a high morale group and a low 
morale group. As Table 6 shows, workers 
with higher financial expectations were sig- 
nificantly more dissatisfied than those with 
lower expectations. This is mostly true even 
when influences of other factors like earn- 
ings, length of service, supervision, etc., were 
kept constant. The correlation between 
worker morale and expected income after 
partialing out effects of total earning and 


service was — 0.23, and is significant to the 
Ol level. Thus, further studies in this prob- 
lem are desirable. It would also be interest- 
ing to note if and how variation in income 
aspiration is influenced by social-cultural 
factors. A comparison of present findings 
with results from American studies, however, 
does not indicate any marked difference in 
aspiration patterns. 


Received April 23, 1956. 


References 


. Bose, S. K. Man and his work. Presidential ad- 
dress to Section of Psychology, 38th Session 
of the Indian Science Congress Association, 
Bangalore, 1951. 

. Centers, R. T., & Cantril, H. 
and income aspiration. 
chol., 1946, 41, 64-69. 

. Ganguli, H. C. A study on effect of union mem- 
bership on industrial movale. Indian J. Psy- 
chol., 1954, 29, 45-59. 

. Ganguli, H. C. Enquiry into incentives for work- 
ers in an engineering factory. Indian J. soc. 
Wk., 1954, 15, 30-40. 

. Thomsen, A. Expectation in relation to achieve- 
ment and happiness. J. abnorm. soc. Psy- 
chol., 1943, 38, 58-73. 


Income satisfaction 
J. abnorm. soc. Psy- 





Journal 


a Psychology 
Vol. 41, No. 1, 1957 


Factors in Sales Success ' 


Donald E. Baier 


General Electric Company, Schenectady, New York 


and Robert D. Dugan 


State Farm Insurance Companies, Bloomington, Illinois 


This is a report of relationships among 
criteria of performance as a salesman and 
measures reflecting the salesman’s product- 
knowledge; belief in his product; motivation; 
and length of service. The data presented 
are consistent with the conclusion that the 
salesman’s belief in his product (as measured 
by his own buying behavior) and his motiva- 
tion are more important in determining how 
well he does his job than is product knowl- 
edge. Length of service shows no significant 
relation to job performance. 


Method 


The job studied was that of the combination agent 
who sells and services both ordinary and industrial 
life insurance for which premium collections are 
made weekly, monthly, quarterly, or less frequently 
An objective composite measure of job performance 
(Total % Par) was correlated with each of 17 other 
variables in a sample of 346 agents having more than 
three months’ service; intercorrelations of all vari 
ables were obtained. The period covered by the 
job-performance measure varied between three and 
twelve months. The measure was determined as of 
December, 1955. The Information Index, a test of 
life-insurance knowledge, was administered in Au- 
gust, 1955, and data on life-insurance ownership was 
also obtained at this time 


Results 


The results of the correlational analysis are 
shown in Table 1. A correlation of .13 rep- 
resents the .01 level of significance; .10 is at 
the .05 level. 

Inspection of the correlation coefficients in 
the last column reveals that the values re- 
flecting relations between Total % Par and 
the elements composing this composite fall 
into an expected pattern. Average sales com- 
mission shows the highest correlation (.66) 

1This study was completed while both authors 
were members of the. Personnel Research Depart- 
ment of the Commonwealth Life Insurance Com- 
pany, Louisville, Kentucky. 


37 


with the composite as would be expected of 
a variable based on the average of Ordinary, 
Monthly Debit Ordinary (MDO) and Weekly 
Premium (WP) sales commissions. The three 
elements together constitute three tenths of 
the nominal weight of Total % Par. The re- 
spective correlations for the elements are .39, 
.48, .46 which is consistent with the equal 
weight and emphasis given them by manage 
ment. The three lapse rate factors also show 
approximately equal though lower correla- 
tions — .24, — .26, — .25 which are negative 
because the higher the rate of lapse of insur- 
ance for which the agent is responsible the 
poorer is his performance. The WP ‘% Ar- 
rears show no significant correlation (.06) 
with Total “> Par but that between MDO % 
Arrrears (.14) is significant above the .01 
level. These two measures together have 
only the weight of any other single element 
in determining Total “% Par. They are shown 
separately merely because of computational 
requirements arising from the fact that each 
is a measure of the proportion of money which 
is due to be collected weekly or monthly that 
is actually collected on schedule 

The final element entering the composite 
labeled Total % Par is % Recall Commission 
and it shows no significant relation (.04). 
The composite thus accords nominal weights 
of 3 for sales, 3 for conservation of business 
(lapse rates) 1 for collection and 1 for per- 
centage of commission credited to the agent’s 
pay account that is later withdrawn because 
the business on which it was based lapsed 
prior to the first anniversary of the issuance 
of the insurance. 

Aside from the intercorrelations of criterion 
elements with the composite, one sees in 
Table 1 that Total % Par is most signifi- 
cantly associated with Life Insurance Owned 
(.30). Other correlations significant at the 





‘SNInNy Pesesturupy Kepuyl UOteusoSjuy “psooey swutog Jed SoG] Jequladeq] 
‘PAP %T—E1V 5149] 26S—Ol—"2I0N 

*J23}9q JO QOUBIYIUBIS JO [PAI] 10° oe 

DOUBIYIUTIS JO J2AI] SO" PUE [QO Us28Xj}eg » 








wed % [ewL 
P2OUME) “SUT IIT 
pexorduy sreaq jo "ON 

00 suvauy °% OGW 
etd sIPeuy 9% dM 

80 t0'— 221s qed OGN 
Ot ett’ P : 21S 1q9d dM 
60° > ” aey asde] pip 

weft we lT j ‘ , ey ede] OGN 
e0T «ST 2 " ; 2 sey asdeyT dM 
so°— 20° —- ' ° ° . os8f" UOTSSTUIMIOT SaTES TM 
eefT all'- F s 10 60 UOISSILILIOT) BITES OCW 
10° elt — Ta) eet — 201° : WOISSTLULOT) S2TBS “PAO 
00 67 80 ets wef" 4 e207 UOTSSTMIWOD [TRI % 
ee8l ee9l'— : 00 , 7 97" . ° eel? +0" - WOTSSTUIWIOT SaTeS “Say 
00 ell — eS8I'— so 80°— 60" ° esfl 60°- asuno) suepuodsal205) @1d 
ett Oo eebl— oe87" P j «8ST — 00 3 eeTZ 1 30° . ‘ jooyrds eusurepuny got 
weil eet Il — 6 - F P F elt j P wef o9l'— uy WoONeUJOsUT 


UA “dwy “uy “ay 79Q29q wqeaq 4 wdeyT osdey 6 asdey] Sates S2TeS SaTPS TRS OSes ag 1a 
a] ‘wal "SA 2 % oan dM ‘PO OaANn dM dM Oadw ‘PAO o “ay 5 nant 








S 
Se 
= 
Q 
g 
3 
3s 
3 
& 
a) 
= 
8 
~ 
S 
S 
a) 
&) 
3 
5 
S 
QS 











(QAIOS SPJUOU! 99IG} UY? DOU! qIIM SJUasy seDuasy IIH IuIsIq 


SIINSBIFY WOUIILU) pue 10}Npe1g Suowy suonevypes0912}uy] 


T 19% L 








Factors in Sales Success 39 


Ol level or better are with WP Debit size 
(.23), MDO Debit size (.21) and DLB Cor- 
respondence Course (.18). Attendance at 
the Job Fundamentals School (.11) and In- 
formation Index score (.12) become signifi- 
cant as the criterion of significance is shifted 
from the .01 toward the .05 level. 

Notice the significant associations between 
Life Insurance Owned and Information Index 
score (.26); attendance at Job Fundamentals 
School (.19); completion of DLB Corre- 
spondence Course (.17). Observe also that 
Information Index score fails to correlate sig- 
nificantly with attendance at Job Fundamen- 
tals School (.07); with completion of DLB 
Correspondence Course (.04) and is only 
marginally related to Total % Par (.12). 
At the same time Total % Par correlates 
significantly with completion of DLB (.18) 
and with Life Insurance Owned (.30) as we 
have already noted above. 

Comment on the remaining significant cor- 
relation coefficients shown in Table 1 would 
tend to lead into an analysis of the rationale 
of the composite criterion which is outside 
the scope of this article. 

No rigidly controlled studies of the reli- 
ability of the job-performance criteria are 
available. In a related study the self-corre- 
lations of the composite (Total % of Par) 
and its components across several different 
overlapping periods of time yielded coeffi- 
cients which varied from .36 to .87. 


Discussion 


Criteria. The system for providing the 
composite criterion, Total % Par, had been 
constructed through the joint efforts of the 
authors and sales management personnel. It 
had two primary objectives: (a) To define 
the major work areas of the agent’s job and 
report to him periodically on his perform- 
ance in each so that his own and manage- 
ment’s efforts to do a well-balanced, com- 
petent job would be properly guided; (b) To 
provide an economical yet technically ade- 
quate criterion for research purposes which 
would be readily and continuously available. 
The pattern of interrelations among criterion 
elements and the composite is consistent with 
expectations and management’s heavy em- 


phasis on sales and conservation of life in- 
surance as major components of agent job 
performance. The lack of close association 
between the collection element and the other 
elements or the composite was expected. 
This is a vital job element but performance 
in it is evidently not closely related to other 
important aspects of the job. A closer rela- 
tion between % Recall Commission and To- 
tal “© Par or its elements was expected until 
it was remembered that more than half the 
agents included in the study had less than 
two years of service. A minimum of two 
years is required before the full impact of 
this element is realized since the amount of 
recall of commission tends to increase through- 
out the first two years of an agent’s service. 
This is true because the amount of commis- 
sion withdrawn is determined by the amount 
of premium that would fall due during the 
first year of the insurance which remains un- 
paid when the policy lapses. Hence, an agent 
in the twelfth month of his service who sold 
a very large insurance policy which then 
lapsed in its eleventh month would experi- 
ence during his twenty-fourth month of serv- 
ice withdrawal of part of the substantial 
commissions previously credited. The data 
presented in Table 1 plus later studies not 
reported in this paper support the conclusion 
that the twofold objective of the criterion 
system was attained. 

Other variables versus criteria. Amount of 
Life Insurance Owned by the agent showed 
the highest correlation with Total % Par 
(.30). Unpublished selection research by the 
authors and others has shown that the amount 
of life insurance owned at time of application 
for the agent’s job is predictive of later per- 
formance criteria. It has also been deter- 
mined that the agents in the present study 
own significantly more life insurance than 
agents with three months or less service. 
Reference to Table 1 shows that Life Insur- 
ance Owned is significantly related to all job 
elements except business persistency, MDO 
debit size, and length of service. However, 
it is not intended to suggest that sales man- 
agers will necessarily have better life-insur- 
ance agents by selling them life insurance. 

These findings do suggest that the life-in- 








40 Donald E. Baier and Robert D. Dugan 


surance salesman whose personal life-insur- 
ance program reflects stronger belief in the 
value of his product compared to other pur- 
chasable goods or values is the more effective 
agent. That the ownership of more life in- 
surance by the better agents is not simply a 
consequence of higher economic status is sug- 
gested by the following facts. Analyses by 
the Life Insurance Agency Management As- 
sociation of 1950 census data on family ex- 
penditures show that while the amount spent 
for life insurance increases with income, there 
is much greater variation in these expendi- 
tures within a given bracket than for other 
categories of spending. Also, average agent 
earnings of $80 to $90 per week make it 
quite possible for the agent who thoroughly 
understands and is sold on his product to own 
a considerably larger volume of insurance 
than the average of $10,891 for the agents 
studied. The distribution of agent-owned 
life insurance for the sample of this study 
is: $5,000 = 25 percentile; $10,000 =54 per- 
centile; $15,000 = 76 percentile; $20,000 = 
91 percentile. 

The pattern of correlations found in Table | 
and pointed out above suggests that the agent 
who owns more life insurance has greater 
product-knowledge (as measured by the In- 
formation Index) and is more likely to have 
completed the DLB Correspondence Course 
covering his job. Enrollment in and comple- 
tion of this course is voluntary and may, 
therefore, be considered to reflect interest in 
and motivation for doing the agent’s job well. 
That motivation and belief in product are 
more important to agent performance than 
technical knowledge is a conclusion which 
seems warranted by the data. For it will be 
recalled that the Information Index score is 
not significantly related to attendance at Job 
Fundamentals School or to completion of 
DLB and only related to the extent of .12 
to Total % Par. Yet Total % Par corre- 
lates .30 with Life Insurance Owned and .18 


with DLB, albeit the latter correlation de- 
rives from a split in the sample such that only 
38 men completed DLB. An earlier study 
(1) showed no significant correlation between 
Information Index scores and district office 
records of average agent sales performance 
when the .01 leve! was used as a criterion of 
significance. The present study finds the cor- 
relation between Information Index and Ordi- 
nary sales just reaching the .01 level; for 
MDO sales it is just below this level but is 
insignificant in the case of WP sales. 

Years of employment appear to be unre- 
lated to Total “% of Par. The agent with 
longer service has lower MDO sales and 
lower average sales but better persistency 
(lower lapsation rate) of business. His col- 
lection tasks are greater and he has a bigger 
“captive market” as indicated by the correla- 
tions of .30 with WP debit size and by an- 
other study (not otherwise reported herein) 
which shows significant differences between 
agents of 5 or more years’ service and agents 
with 6-12 months’ service. This latter study 
shows that the longer service agents have 
smaller MDO sales, average sales, and lapsa- 
tion; larger WP and MDO debit size. These 
results are based on a sample only partially 
overlapping with that in the present study. 


Summary 


A reasonable conclusion from the data pre- 
sented in this report is that, insofar as the 
agents and company studied are representa- 
tive, a salesman’s belief in his product and 
his motivation are more important than tech- 
nical knowledge in determining how well he 
does his job. Length of service is unrelated 
to job success. 


Received May 23, 1956. 


Reference 


1. Baier, D. E., & Dugan, R. D. Tests and perform- 
ance in a sales organization. Personnel Psy- 
chol., 1956, 9, 17-26. 





Journal of Applied Psychology 
Vol. 41, No. 1, 1957 


The Relationship of Typographic Arrangement to the Learning of 
Technical Training Material ' 


George R. Klare 


Ohio University 
William H. Nichols 


The RAND Corporation 


and Emir H. Shuford 


Lackland Air Force Base 


This study is the sixth in a series (3, 4, 5, 
6) * on the relationship of various communi- 
cation variables to the learning of technical 
material. “Typographic arrangement,’ as 
used here, refers to two unusual methods of 
typography called “square span” and “spaced 
unit.” In the square-span presentation, first 
suggested by Andrews (1), units of material 
are presented in two-line blocks set apart by 
spacing. This arrangement, as compared to 
the standard, was found to increase slightly 
the comprehension of short written passages. 
Spaced unit, suggested by North and Jenkins 
(8), utilizes the single-line arrangement of 
regular typography but again separates the 
units by spacing and therefore is similar to 
“phrase reading’ as sometimes used in read- 
ing training. The use of spaced unit was 
found to increase significantly the comprehen- 
sion and reading speed of subjects compared 
to both standard and square-span_ typog- 
raphy. A further study by Nahinsky (7), 
using a tachistoscope for presentation rather 
than a normal reading situation, showed the 
square-span arrangement yielded significantly 
higher scores in span of visual comprehension 
than either standard or spaced-unit arrange- 
ments. 


1 This research was supported in part by the United 
States Air Force under Contract No. AF33(038) 
25726, monitored by the Persounel and Training Re- 
search Center. Permission is granted for reproduc 
tion, translation, publication, use, and disposal in 
whole and in part by or for the United States Gov 
ernment. The data on which the study is based 
were collected while the authors were at the Uni 
versity of Illinois 

2 The fifth study, on the effect of format organi 
zation upon the learning of technical material, has 
been submitted for publication. 


Below are sample sentences in standard, 
square-span, and spaced-unit arrangements. 
1. Standard: 


The remaining 30% to 40% 
through the cooling system. 


must be dissipated 


Square span 
through the 
cooling system 


must be 
dissipated 


The remaining 
30% to 40% 
Spaced unit 
The remaining 30% to 40% 
through the cooling system 


must be dissipated 


One of the chief problems in the use of 
these arrangements appears to be the most 
effective grouping of material into units 
(called “thought units”) for presentation. 
No rules exist for such grouping, either in 
the two articles cited or in any other sources 
known to the authors. Nevertheless, as 
North and Jenkins state, perhaps the chief 
advantage of the proposed typographic rear- 
rangement lies in the thought unit grouping 
itself (see also 2). The present study serves 
as a check on this hypothesis by using 
“longer” and “shorter” units. In 
addition, it assesses the effects of the experi- 
mental arrangements upon reading efficiency, 
immediate retention, and acceptability of 
technical material when presented to subjects 
of a wide range of ability 


sizes of 


Method 


Experimental materials A 1,206-word 
printed lesson from an aircraft mechanics 
training course at Chanute Air Force Base 
was used in this as in the previous five stud- 
ies. It consisted of a first half on the “in- 
duction system” half on the 


and a second 








42 George R. Klare, William H. Nichols, and Emir H. Shuford 


“cooling system” of an aircraft engine. This 
“Present” (standard typography) version was 
used in the study along with the experimental 
versions described below. 

The Present (P) version was first divided 
into thought units, with the units then being 
set up in both square-span and spaced-unit 
typography. Five persons independently at- 
tempted to determine the units for the entire 
lesson. The only directions to each were that 
such units frequently correspond to phrases, 
and should generally be within the apprehen- 
sion span of the reader. 

Very good general agreement was found 
among the five in the choice of units, the 
more common of the disagreements being a 
tendency for two of the persons to select 
longer units than the other three. It should 
be stated that final choices were determined 
primarily by agreement among the five per- 
sons, with the rules described briefly below 
arising from analyses of these agreements. 
Therefore, specific rules could not always be 
applied, and at times contradictions arose 
among others. In general, however, agree- 
ment among persons and with rules was high. 

The common rules generated by both 
longer and shorter units were: (a) technical 
terms consisting of more than one word were 
not broken into two units; (+) thought units 
were never arbitrarily broken because of lack 
of space at the right margin of the paper: 
and (c) existing punctuation was used and 
therefore determined units to some extent 
(i.e., no mark of punctuation was ever used 
inside a thought unit). 

For the shorter units, the chief rules were: 
(a) subject and predicate of simple sentences 
were separated, and object was also separated 
from predicate; (b) phrases (chiefly prepo- 
sitional) were set off; (c) noun modifiers, if 
short, were linked with the noun, and verb 
modifiers with the verb, but single subjects 
or objects stood alone; and (d) clauses were 
set off and, if long, broken into appropriate 
thought units. 

For the longer units, the chief differences 
were: (a) subject and predicate of short sim- 
ple sentences were not separated, but object 
was separated from predicate; (6) phrases 
were set off, except in short sentences (when 


they were linked chiefly with the verb); (c) 
single words at beginning or end of sentences 
(i.e., used chiefly as subject or object) were 
joined with adjacent thought units; and (d) 
short clauses were not broken. 

The total number of units in the shorter unit 
version was 369, and in the longer unit, 281; 
the median number of words per unit were 
2.73 and 3.66, respectively, for the two ver- 
sions. The shorter units were given the desig- 
nation A, so that the square-span version us- 
ing them was coded SSA and the spaced-unit 
version SUA. The longer thought units were 
termed B, resulting in SSB and SUB. Ex- 
cept for the typographic arrangement and the 
small code letters placed inconspicuously in 
the upper right-hand corner of the first page, 
all versions were the same in format com- 
pared to each other and to the P (standard) 
version. 

Every fifth line of each version was num- 
bered as an aid in obtaining total number of 
words read by subjects (see below for further 
information). Each subject also recorded a 
letter (coded to indicate time) after each 
complete reading of the lesson. 

A 50-item multiple-choice test was used to 
measure immediate retention, as in the previ- 
ous studies. The items were selected from a 
pool in such a way that each paragraph of 
the reading material was alloted a percentage 
of items proportional to its size. The split- 
half reliability of the test was .87. 

Acceptability of the various typographic 
versions (except P, which was not of concern) 
was determined by answers to two questions 
printed on the back of the answer sheet for 
the test. The first question asked subjects to 
indicate which was more pleasant to read, the 
presentation used in the lesson or that nor- 
mally used in printed material. The second 
question asked why the particular presenta- 
tion was chosen and listed four possible rea- 
sons: (a) “is more familiar,” (6) “is more 
interesting,” (c) “is easier to read,” and (d) 
“permits faster reading.” Subjects were asked 
to check all reasons considered important, 
and to double check the most important one. 

Subjects and procedure. The subjects used 
were 533 male airmen in indoctrination train- 
ing at Sampson Air Force Base (107 subjects 





Typographic Arrangement and Learning of Material 43 


each for the P, SUA, and SUB versions, and 
106 each for the SSA and SSB versions). 
Various aptitude indices, based on a scale of 
1 to 9 and conveniently called “stanines,”’ 
were available for each airman, and differ- 
ences in aptitude of the experimental groups 
were accounted for in subsequent analyses. 

After the various versions were randomly 
distributed, each subject read a set of in- 
structions printed in the same typographic 
arrangement as the particular version he had. 
All subjects were given 25 minutes to read 
the lesson, and were instructed to re-read it, 
in a normal manner, as many times as pos- 
sible before the signal to stop was given. 
During this- period they indicated amount 
read in the way previously described. The 
acceptability questions were next answered, 
and following this, 40 minutes were allowed 
for answering the test. 


Results 


Efficiency of reading. Reading efficiency 
for the various typographic arrangements was 
measured both in terms of amount of time 
(number of minutes) spent in a first reading 
of the lesson and in terms of total number of 
words read (in the 25-minute experimental 
period). Several analyses were made for each 
of these dependent variables, based upon the 
scores of groups of subjects comparable in 
ability. 

For time spent in first reading, a 3 x 8 
analysis of variance (P, SUA, and SSA typo- 
graphic arrangements X 8 levels of aptitude) 
indicated significant variance attributable to 
typographic arrangement and to aptitude. 
The mean times spent were P (standard 
arrangement) = 7.48 minutes, SUA (spaced 
unit with shorter thought units) = 7.75 min- 
utes, and SSA (square span with shorter 
units) = 9.05 minutes. <A_ similar 3 x 8 
analysis of variance using the P, SUB, and 
SSB arrangements (spaced unit and square 
span with longer thought units) again showed 
significant variance attributable to both typo- 
graphic arrangement and to aptitude. The 
means again showed that the standard ver- 
sion was read approximately one-fourth min- 
ute faster than the spaced-unit and upwards 
of one minute faster than the square-span 


arrangement. Since it appeared that the 
shorter thought units resulted in slightly 
greater reading speed than the longer ones, 
a 2X28 analysis of variance was com- 
puted, using longer and shorter units xX 
spaced-unit and square-span arrangements X 
8 levels of aptitude. The variances attribut- 
able to both arrangement and aptitude were 
significant, but that due to thought-unit size 
was not. 

Similar analyses were made for total num- 
ber of words read in the 25-minute experi- 
mental period. A comparison of the mean 
values showed that the standard (P) arrange- 
ment resulted in approximately 25 more words 
read than the spaced-unit (SUA and SUB) 
and 265 more than the square-span (SSA and 
SSB) arrangements. A 3X8 analysis of 
variance, P, SUA, and SSA arrangements 8 
levels of aptitude, indicated, however, that 
only the variance attributable to aptitude was 
significant. A similar analysis of the P, SUB, 
and SSB versions also yielded significant vari- 
ance only for aptitude. A 2 x 2 x 8 analy- 
sis of variance, longer and shorter thought 
units X square span and spaced unit x 8 
levels of aptitude, again indicated that only 
the variance attributable to aptitude was sig- 
nificant. 

It is interesting to note that in all three 
analyses involving time spent in a first read- 
ing of the lesson material typographic ar- 
rangement accounted for a significant por- 
tion of the variance; in all three analyses of 
number of words read in 25 minutes, how- 
ever, the variance attributable to typographic 
arrangement was not significant. 

Acceptability. Acceptability was deter- 
mined by answers to the question of which 
was more pleasant to read, the experimental 
arrangement used or the arrangement nor- 
mally found in printed material. A compari- 
son of preferences by percentages indicated 
that the normal arrangement was significantly 
preferred (.01 > p > .001) to the SUB, SSA, 
and SSB arrangements, but not to SUA 
(shorter spaced unit). Further analyses of 
the preference data showed that (a) the 
spaced-unit arrangement was more acceptable 
(p< 01) than the square-span, for both 
shorter and longer thought units, and (b) 








44 George R. Klare, William H. Nichols, and Emir H. Shuford 


the shorter thought units were preferred to 
the longer (p < .02) for both typographic 
arrangements. Of the four reasons listed for 
the indicated preferences (familiarity, inter- 
est, ease of reading, and speed of reading), 
subjects preferring the experimental arrange- 
ments to the normal chose “ease” as of pri- 
mary importance; the subjects who preferred 
the normal arrangement gave “familiarity” as 
the primary reason but considered ‘“ease”’ 
closely secondary in importance. 

Immediate retention test score. Analysis 
of scores on the 50-item test showed that 
the mean differences between the standard, 
spaced-unit, and square-span arrangements 
were small. A 3 X 8 analysis of variance, P, 
SUA, and SSA arrangements 8 levels of 
aptitude, yielded significant variance attribut- 
able only to aptitude. A similar analysis in- 
volving P, SUB, and SSB (longer units) 
yielded significant variance attributable to 
both aptitude and interaction. Examination 
of the mean scores suggests that the signifi- 
cant interaction effect may have been due to 
high ability subjects receiving higher scores 
and low ability subjects lower scores on the 
experimental arrangements than subjects of 
similar ability on the standard arrangement. 


Discussion 


The results of this study, to be most mean- 
ingful, will be discussed in terms of the seem- 
ingly contradictory results found by previous 
investigators. It will be recalled that An- 
drews (1), in the earliest study, found the 
square-span arrangement to provide a slight 
increase in comprehension compared to a 
normal typographic arrangement; North and 
Jenkins (8) found spaced unit superior to 
both square-span and normal arrangements 
in terms of increases in reading speed and 
comprehension; Nahinsky (7), using tachis- 
toscopic presentation rather than the normal 
reading situation, found square-span superior 
to both spaced-unit and normal presentations 
when span of visual comprehension was used 
as the dependent variable. 

The results of North and Jenkins and of 
Nahinsky can be reconciled in terms of the 
differences in the comprehension measures 
used. Comprehension, used by North and 


Jenkins, refers to test items answered cor- 
rectly; span of visual comprehension, used 
by Nahinsky, refers to number of words of 
material reported correctly after a brief ex- 
posure. It seems reasonable that square span 
might provide an advantage in span because 
it adds a vertical dimension to the restricted 
horizontal span developed in reading, but 
only when the situation is novel (as with a 
tachistoscope). In the normal reading situa- 
tion, the strongly developed habit of ignoring 
the lines of print above and below that being 
read cannot easily be broken, and therefore 
square span is at a disadvantage, as North 
and Jenkins found. 

The present study suggests the important 
part played by practice where habits as 
strongly developed as those in reading are 
involved. Spaced unit, a much less radical 
departure than square span from the normal 
typographic arrangement, had only a slight 
effect upon reading speed; square span 
slowed reading speed significantly at first, 
but nonsignificantly as the experimental pe- 
riod wore on. North and Jenkins had found 
no difference with two limited degrees of prac- 
tice, but they point out that their material 
was nontechnical and only moderately diffi- 
cult. In the present study the material was 
not only technical but was also read by sub- 
jects of a wide range of ability, only the more 
able of whom would appear to correspond to 
North and Jenkins’ college student subjects. 

The difference in ability of the subjects 
used may well explain why North and Jen- 
kins found spaced unit to provide clearly in- 
creased comprehension scores where the pres- 
ent study did not. It will be remembered 
that there was some evidence that the more 
able subjects did profit from the spaced-unit 
and square-span arrangements, and these sub- 
jects would be roughly comparable to college 
students in ability. 

In reviewing these studies, it would appear 
that square span possesses two potential ad- 
vantages over the normal typographic ar- 
rangement, in that it might provide an in- 
crease in (a) span of visual comprehension 
with intensive training, and (+) comprehen- 
sion scores due to the provision of thought 
units; the spaced-unit arrangement, on the 





Typographic Arrangement 


other hand, possesses only the latter. . The 
potential advantage that might be provided 
by thought units is little known, of course, 
because relatively little is known about how 
best to set them up. The present study sug- 
gests that shorter units are more acceptable 
than longer units, at least to relatively un- 
practiced readers. Further work on the de- 
velopment of rules for setting up thought 
units is clearly indicated. 


Summary 


The results of this study indicated that two 
newly developed methods of typographic ar- 
rangement, square span and spaced unit, may 
possess certain advantages over the usual ar- 
rangement. While square span slowed the 
reader on first encounter, this effect tended to 
diminish with practice; spaced unit had little 
effect upon reading speed. The reader found 
the newer arrangements less acceptable than 
the more traditional, but this feeling was less 
marked when the “thought units” in the ar- 
rangement were small rather than large. The 


chief effect upon immediate retention pro- 
duced by the newer arrangements compared 


to the older was to provide an increase in test 
scores for the more able readers. It should 
be emphasized that the advantages of the 
newer arrangement are best described as po- 
tential, since they interfere with strongly de- 
veloped reading habits. This study indi- 


and Learning of Material 45 


cated, however, that these arrangements may 
be of value for subjects who have had some 
practice in reading them and/or high ability. 
Further work in the setting up of thought 
units was also indicated. 


Received April 19, 1956. 


References 


Andrews, R. B. Reading power unlimited. 
Outlook, 1949, 33, 20-21. 

. Klare, G. R., & Buck, B. Know your reader. 
Camden, N. J.: Thomas Nelson, 1954 

3. Klare, G. R., Gustafson, L. M., Mabry, J. E., & 
Shuford, E. H. The relationship of immedi 
ate retention of technical training material to 
career preferences and aptitudes. J 
Psychol., 1955, 46, 321-329. 

. Klare, G. R., Mabry, J. E., & Gustafson, L. M 
The relationship of human interest to immedi 
ate retention and to acceptability of technical 
material. J. appl. Psychol., 1955, 39, 92-95. 

5. Klare, G. R., Mabry, J. E., & Gustafson, L. M 
The relationship of patterning (underlining) 
to immediate retention and to acceptability of 
technical material. J. appl. Psychol, 1955, 
39, 40-42. 

. Klare, G. R., Mabry, J. E., & Gustafson, L. M 
The relationship of style difficulty to immedi- 
ate retention and to acceptability of technical 
material. J. educ. Psychol., 1955, 46, 287-295. 

. Nahinsky, I. D. The influence of certain typo- 
graphical arrangements upon span of visual 
comprehension. J. appl. Psychol. 1956, 40, 
37-39. 

North, A. J., & Jenkins, L. B. Reading speed and 
comprehension as a function of typography 
J. appl. Psychol., 1951, 35, 225-228 


Tex. 


educ 





Journal of Applied Psychology 
Vol. 41, No. 1, 1957 


Some Personal and Social Attitudes of Habitual Traffic 
Violators * 


Harry W. Case 


Institute of Transportation and Traffic Engineering 


and Roger G. Stewart 


University of California, Los Angeles 


Popular opinion and legal usage have as- 
sumed that the negligent operator of an auto- 
mobile or habitual traffic violator is a sepa- 
rate social entity who may be expected to 
exhibit a concentric attitudinal matrix to- 
ward the following: (a) the law, (b) the en- 
forcement agency, (c) concept of self, and 
(d) factors pertaining to the elimination of 
the condition. Furthermore, it is assumed 
that habitual violators as a group deviate 
considerably from a normal population of 
drivers in various personal and socioeconomic 
characteristics. The philosophy that under- 
lies these views is that traffic violations con- 
stitute deviate events in a social environment. 
If an individual receives relatively frequent 


traffic citations, then it is generally believed 
that he must be an “abnormal” or “peculiar” 


member of society in some respects. There- 
fore, it may be supposed that such individu- 
als, who may be defined legally as “negligent 
operators,” tend to constitute a socially unique 
group in various personal and social traits. 
Proof or disproof as to whether or not negli- 
gent operators do constitute a socially unique 
group in attitude orientation, as well as in 
other dimensions, is an important factor in 
the successful handling of them by social 
agencies. 

In order to provide answers to questions in 
this area, a study of negligent operators in 
Los Angeles was undertaken by the Institute 


' The authors wish to express their appreciation to 
the following individuals and agencies for their co- 
operation during various phases of this study: Mr. 
Parks Stillwell, Judge of the Municipal Court of the 
Los Angeles Judicial District; Mr. Paul Mason, Di- 
rector, and Mr. Fred P. Williams, Chief of Division 
of Drivers Licenses, Department of Motor Vehicles, 
State of California; Dr. T. H. Southard and Mr 
F. H. Hollander, Numerical Analysis Research, De- 
partment of Mathematics, University of California, 
Los Angeles. 


of Transportation and Traffic Engineering, 
University of California, Los Angeles. The 
Vehicle Code of California (3) defines a neg- 
ligent operator as follows: “Any person who 
has been convicted on four or more occa- 
sions in a consecutive period of 12 months, 
or six or more occasions within a consecu- 
tive period of 24 months, or eight or more 
occasions within a consecutive period of 36 
months, of violations of the provisions of the 
Vehicle Code involving the safe operation of 
vehicles on the highway . . . shall be prima 
facie presumed to be ‘a negligent operator of 
a motor vehicle.’” Three-hundred individu- 
als who had become negligent operators in 
Los Angeles during 1952-1953, according to 
the Vehicle Code of California, were selected 
as subjects. These individuals were given an 
informal but carefully structured interview 
subsequent to their most recent arrest but 
prior to their hearing in traffic court. The 
interviewers were two Ph.D. candidates in 
Psychology attending the University of Cali- 
fornia, Los Angeles, who had had extensive 
interviewing experience. From the interview 
and a list of recorded violations of each 
subject, the interviewers obtained data on 
various personal and socioeconomic factors 
surrounding the violator and his previous 
violations. Some statistical comparisons using 
chi square were made of independent inter- 
view reports of the two interviewers. These 
comparisons showed that no significant varia- 
tions occurred between their reports. In a 
previous paper (2), detailed statistics were 
presented for the entire group of 300 sub- 
jects on several personal and socioeconomic 
variables and on various types of traffic viola- 
tions. Except for the frequencies of traffic 
violations, the group showed reasonably nor- 
mal distributions on all variables. 





Personal and Social Attitudes of Habitual Traffic Violators 47 


Method 


The 300 negligent operators were treated in ac- 
cordance with two experimental designs—100 were 
used for a study of responses to attitude questions 
and the total of 300 for compilation of sociometric 
data. Within each design, several hypotheses were 
formulated and tested concerning the direction and 
extent of the relationship between various pairs of 
variables. 


Selection of Variables 


From the interview data and previous violation 
history of the subjects, a total of 21 variables was 
chosen for statistical analysis. These variables have 
been classified as personal variables, violation vari- 
ables, and attitude variables. 

Personal variables. Information on seven per- 
sonal or socioeconomic variables was obtained from 
the interview with each operator. These variables 
were the following: (a) age, (b) marital status, (c) 
number of dependents, (d) national and racial origin, 
(e) occupational skill level, (f) number of years 
lived in California, and (g) estimated number of 
miles driven per day. 

Violation variables. Frequencies of violations and 
citations were obtained on nine violation variables 
from the most recent arresting citation and the list 
of recorded violations for the preceding year. First, 
the frequencies were tabulated as (a) moving viola- 
tions, (b) nonmoving violations, and (c) total vio- 
lations. Then, the individual frequencies of (d) 
speeding violations, (e) stop-signal violations, (f) 
stop-sign violations, and (g) stop-signal plus stop- 
sign violations were obtained. Finally the number 
of (hk) citations involving moving violations, and the 
(i) total number of citations were also determined 
for each subject. The frequency counts for these 
nine variables consisted of all recorded violations of 
the Vehicle Code of California committed by the 
subjects in California during their most recent year 
of driving prior to their hearing in traffic court. 

Attitude variables. For the last 100 cases, more 
detailed interview data were obtained, consisting 
largely of responses to several general attitude ques- 
tions. These attitude variables were named attitude 
toward the law, attitude toward the police, self-rat- 
ing as a driver, suggested self-improvement as a 
driver, and reason for pleading “guilty” or “not 
guilty” in court. Since only two of the 100 cases 
pleaded “not guilty,” this part of the last attitude 
variable was disregarded. 


Procedure 


The 300 negligent operators were divided by odd 
vs. even code numbers (numbers assigned in se- 
quence when the individuals were interviewed) into 
two samples of 150 subjects each, which were called 
the Alpha and Beta samples. The Alpha sample was 
used solely for testing hypotheses and the Beta sam- 
ple for cross validation. In the second design in- 
volving the last 100 subjects, the two groups were 


called the Alpha Clinical and Beta Clinical samples 
respectively. 

Data for all variables were recorded in detailed 
IBM cards for each subject. Responses to interview 
questions were classified item by item into meaning- 
ful categories by two psychologists before being 
transferred to IBM cards. 

Considering the Alpha and Beta samples separately 
within each design, dichotomies were established for 
all 21 variables. Numerical variables were dichoto- 
mized nearest their median and nonnumerical vari- 
ables according to accepted social or psychological 
classifications. For the variables of attitude toward 
the law and toward the police, the responses were 
classified as “favorable” or “unfavorable.” For the 
remaining three attitude variables, the responses 
were considered as “favorable toward self” or “un- 
favorable toward self.” All responses were classified 
independently of any knowledge of individual stand 
ings in other variables. All hypetheses were formu- 
lated and tested in terms of dichotomized variables 


Hypotheses 


Hypotheses concerning possible associations be- 
tween variables were formed on the basis of results 
from studies of other traffic violators and various 
popular and expert opinions. Rather than to enu- 
merate all of the 143 hypotheses tested in this in- 
vestigation, it will be necessary simply to indicate 
the manner in which they were formed and the sub- 
jects on whom they were tested 

First experimental design. The data on the per- 
sonal and violation variables for the 300 negligent 
operators constitute the first experimental design 
Hypotheses were made that more violations and 
citations than the median, as opposed to fewer than 
the median, would be associated with the seven per- 
sonal variables as follows: (a) younger age, (b) be 
ing unmarried, (c) fewer dependents, (d) nonwhite 
national and racial origin, (¢) unskilled occupational 
status, (f) fewer years lived in California, and (g) 
more miles driven per day. From the pairing of 
these seven personal with the nine violation vari- 
ables, 63 hypotheses were formed concerning per- 
sonal variables and the distributions of violations 
and citations in the Alpha sample of 150 cases. 

Second experimental design. Concerning the atti 
tude variables for the last 100 cases which fall within 
the second experimental design, the hypotheses were 
made that the personal variables as indicated above 
would be associated with unfavorable attitudes to- 
ward the law and the police and with favorable atti- 
tudes toward self. Regarding the violation variables, 
it was hypothesized that more violations and cita- 
tions than the median, as opposed to fewer than the 
median, would be associated with the attitude vari 
ables as just indicated. From the pairing of the 
five attitude variables with the seven personal and 
the nine violation variables, 80 hypotheses were for- 
mulated concerning the attitude variables of the 50 
cases from the Alpha sample in the second experi 
mental design. 





Harry W. Case and Roger G. Stewart 


Table 1 


Significant Chi-Square Values for Hypotheses of 
Personal and Violation Variables 


Chi-Square Value 


Alpha Beta 
Sample Sample 


Hypothesis (N=150) (N=150) 


Younger drivers have more speed- 
ing violations than older drivers 
(Supported in Alpha and Beta 
samples) 

Unmarried drivers have more 
speeding violations than married 
drivers (Supported in Alpha 
sample) 

Nonwhite drivers have more 
nonmoving violations than white 
drivers (Supported in Alpha 
sample) 0.02 . 


* Significant at the .01 level 


All hypotheses were tested using the chi-square 
test of independence for the fourfold tables based 
on pairs of dichotomized variables. The two de- 
signs permitted the test of 143 hypotheses. 


Results 
First Experimental Design 


For the Alpha sample of 150 cases, using 
pairs of the seven personal with the nine vio- 
lation variables, the chi-square values for 
only three hypotheses of the 63 hypotheses 
tested were significant at the .05 level of con- 
fidence. One of these three hypotheses was 
also supported in the cross-validation sample 
at the .05 level. These results are shown in 
Table 1. 

In each instance, the direction of the asso- 
ciation between pairs of variables was the di- 
rection expected according to the hypothesis. 
The association between age and number of 
speeding violations on record during the previ- 
ous year was the only significant finding in 
the two experimental designs which was veri- 
fied by the cross-validation sample. This re- 
sult tends to support some popular views and 
frequently expressed opinions of enforcement 
authorities on speeding habits and arrests of 
younger drivers. However, our findings should 
not be regarded as proof that younger peo- 


ple drive at higher speeds or commit more 
speeding violations than other drivers. Sev- 
eral factors such as variations in enforcement 
practices and individual driving exposure may 
account—in part at least—for this finding. 

Concerning the other two hypotheses, the 
initial significant values of chi square were 
caused apparently by chance. 

Inasmuch as just one out of the 63 hy- 
potheses was supported in both the initial 
and cross-validation samples, one would be 
unable to predict from the personal variables 
with reasonable accuracy which violators 
would have had more or fewer violations or 
citations of the classes considered during the 
preceding year than other violators. The one 
exception is that the drivers younger than the 
median age of the sample received citations 
for more speeding violations than the older 
drivers. 


Second Experimental Design 


For the Alpha Clinical sample of 50 cases, 
pairing each attitude variable with each per- 
sonal and violation variable, the chi-square 
values for just six hypotheses out of the 80 
hypotheses tested were significant at the .05 
level of confidence. These results are dis- 
cussed separately for the personal and the 
violation variables. 

Personal vs. attitude variables. Table 2 
shows the two instances in which one of the 
attitude variables was significantly associated 


Table 2 


Significant Chi-Square Values for Hypotheses of 
Personal and Attitude Variables 


Chi-Square Value 


Alpha 
Clinical Clinical 
Sample Sample 
(N=50) (N=50) 


Beta 


Hypothesis 


Nonwhite drivers have unfavorable 
attitudes toward the police (Re 
jected in Alpha Clinical sample) 

Drivers with fewer dependents 
plead “guilty” for reasons favor 
able toward self (Rejected in 
Alpha Clinical sample) 


6.16* 2.43 


* Significant at the .0S level. 





Personal and Social Attitudes of Habitual Traffic Violators 


with one of the personal variables in the 
Alpha Clinical sample. Neither result was 
upheld in cross validation. 

In both instances above, the direction of 
the frequencies in the fourfold table was re- 
versed to that which had been hypothesized. 
.Each hypothesis was, therefore, rejected in 
the Alpha Clinical sample with values of chi 
square well beyond the critical value of 3.84 
at the .05 level. However, since the results 
were neither supported nor rejected in the 
Beta Clinical sample, no explanation is of- 
fered concerning the two possible associa- 
tions. 

Violation vs. attitude variables. Table 3 
shows the four instances in which one of the 
attitude variables was significantly associated 
with one of the violation variables in the 
Alpha Clinical sample. In the cross-valida- 
tion sample, none of these results was simi- 
larly supported. 

In the first three instances above, the di- 
rection of the frequencies in the fourfold table 
was reversed to that which had been hypothe- 
sized. Most interesting, perhaps, are the two 
comparisons which involve attitude toward 
police. We find that violators with attitudes 
toward the police classified as favorable tend 
to have been charged with more nonmoving 
and total violations than the other violators. 
While neither result was supported in the 
Beta Clinical sample, the association of fa- 
vorable attitudes toward the police with more 
nonmoving violations yielded a_ chi-square 
value of 3.69, shown in Table 3, which just 
barely falls short of 3.84, the value required 
for significance at the .05 level. One possible 
explanation of these results is that the fa- 
vorable attitudes reflect more personal ex- 
perience with the police, even though the 
additional personal experience came through 
receiving traffic citations. In support of this 
explanation, it is well known in social psy- 
chology and related fields that prejudice to- 
ward certain individuals or minority groups 
often becomes diminished after one has had 
additional personal experience with them or 
has learned more about them. In the traffic 
field, similar processes may operate in some 
individuals to produce attitude changes to- 


Table 3 


Significant Chi-Square Values for Hypotheses of 
Violation and Attitude Variables 


Chi-Square Value 


Beta 
Clinical 
Sample 
(N = 50) 


Alpha 
Clinical 
Sample 


Hypothesis (N = 50) 


Drivers with more nonmoving vio 
lations have unfavorable attitudes 
toward the police (Rejected in 
Alpha Clinical sample) 

Drivers with more total violations 
have unfavorable attitudes toward 
the police (Rejected in Alpha 
Clinical sample) 

Drivers with more total citations 
suggest self-improvement favor 
able toward self (Rejected in Al 
pha Clinical sample) 

Drivers with more stop-signal vio 
lations give unfavorable self 

rating as driver (Supported in 

Alpha Clinical sample) 


* Significant at the .05 level 


ward the police and other enforcement offi 
cials. 

In the other two comparisons shown in 
Table 3, the initial significant values of chi 
square were caused apparently by chance 


Discussion and Summary 

The results indicate that the negligent op 
erators do not constitute a homogeneous group 
with respect to either the personal variables 
or the attitudes expressed toward the law, the 
police, or themselves. The associations be 
tween the attitudes expressed by interview 
and the other variables are not 
stronger than one would expect by chance. 
The favorableness or unfavorableness of the 
attitudes seems to have no consistent rela 
tionship with frequencies of violations or cita- 
tions for the preceding year. The results, 
therefore, fail to support the common belief 
that drivers who have unfavorable attitudes 
toward self or society become serious traffic 
violators and that such violators have devel- 
oped unfavorable attitudes toward traffic laws 
enforcement agencies, or themselves as a re 


responses 





50 Harry W. Case and Roger G. Stewart 


sult of frequent apprehension for traffic vio- 
lations. 

Of the 143 hypotheses tested in the two 
experimenta! designs, only one was supported 
in both the initial and cross-validation sam- 
ples, this one being that subjects younger 
than the median age of the sample had re- 
ceived citations for more speeding violations 
during the year than the older individuals. 
These results, however, should be interpreted 
with caution for two reasons. First, the me- 
dian ages for the two samples were about 
27.5 and 29.5 years, respectively, or a few 
years beyond the teen-age period with which 
speeding is commonly associated. Second, 
the subjects with records of one or more 
speeding violations represented all age groups 
of the two samples. One possible explana- 
tion for the results is that speed laws may be 
enforced more often, or more strictly for 
younger drivers than for older ones. In ad- 
dition, it is conceivable that younger indi- 
viduals have greater driving exposure than 
older ones both in total mileage and in ex- 
posure to traffic police. 

In conclusion, it appears that the drivers 
used in this study are not different from other 
drivers in the characteristics considered, ex- 


cept in the frequency of traffic violations on 
their records. The negligent operators as a 
group appear to be normally functioning 
members of society except that they receive 
traffic citations for sufficient moving viola- 
tions in the Los Angeles area to become “neg- 
ligent operators” as defined by the Vehicle 
Code of California. 


Received December 27, 1955. 


References 


Case, H. W., & Stewart, R. G. Driving attitudes 
Traffic Quarterly, 1956, 10, 364-376. 

. Case, H. W., Reiter, I, Feblowicz, E. A. & 
Stewart, R. G. The habitual traffic violator. 
Washington: Highway Research Board, 1956, 
Bull. No. 120. 

. Department of Motor Vehicles. State of Cali- 
fornia Vehicle Code. Sacramento, 1953. 

. DeSilva, H. R. Why we have automobile acci- 
dents. New York: Wiley, 1942. : 

. Edwards, A. L. Experimental design in psycho- 
logical research. New York: Rhinehart, 1950 

. Eno Foundation for Highway Traffic Control. 
Personal characteristics of traffic accident re- 
peaters. Saugatuck, Conn., 1948. 

. Eno Foundation for Highway Traffic Control. 
The motor-vehicle driver: his nature and im- 
provement. Saugatuck, Conn., 1949. 

. McNemar,Q. Psychological statistics. New York: 
Wiley, 1949. 





Journal of Applied Psycholo 
Vol. 41, 1,4 1, 1957 ” 


An Operational Test of Laboratory Determined Optima of 
Screen Brightness and Ambient Illumination for Radar 
Reporting Rooms * 


E. G. Bessey and G. S. Machen 


Defence Research Medical Laboratories, Toronto 


Laboratory investigations (1, 3) have in- 
dicated that the visibility of targets on a 
radar display is markedly affected by the 
bias of the cathode-ray tube (CRT). It has 
been shown (1, 4) that ambient illumination 
up to a level of 0.1 footcandle does not impair 
visibility. 

Most of this work was done in restricted 
laboratory conditions. The Ss knew where to 
expect the target, and were not distracted by 
false indications caused by circuit “noise.” 
Recently, the potential importance of the 
findings has been emphasized by an analysis 
(2) which proposes that change of visibility 
can be stated as a percentage of range. This 
analysis indicates that the improvement in 
maximum radar range to be expected through 
the use of optimal conditions is of the order 
of 30 per cent. ; 

The experiment reported here is a test of 
the validity of the laboratory findings under 
operational conditions. Six regular service 
operators worked at a radar site. They 
searched for targets, reported their positions 
for plotting, and faced all the difficulties typi- 
cal of the operational situation.” 


Experimental Design 


Apparatus. Trials were conducted at an opera- 
tional station using radar displays fitted with 12” 
CRTs. The displays were of the normal Plan-posi- 
tion Indicator (PPI), on which a radial bright line 
(“sweep-line”) rotates about the center, leaving be- 
hind it a zone of phosphorescence against which tar- 
gets appear as small bright patches, subtending about 
20 minutes of visual angle. Two windowless rooms 
were used. In each, the level of room lighting could 
be controlled by rheostat. 

Procedures. Voltmeters were wired into the dis- 


1 Defence Research Medical Laboratories Report 
No. 163-5, Project No. D77-94-20-22 (H.R. No 
114). 

2The satisfactory execution of this trial is due 
largely to the high degree of cooperation of RCAF 
Air Defence Command and the personnel of No. 14 
Aircraft Control and Warning Squadron. 


play units to measure the grid bias of the CRT. As 
the bias becomes more negative, the sweep-line grows 
dimmer, until it reaches a critical value at which it 
can just be seen. The six operators and the two Es, 
after dark adaptation for five minutes, each deter- 
mined this critical voltage; the mean voltage read- 
ing was taken as Visual Reference Intensity (VRI) 
All experimental settings of CRT bias were made 
relative to VRI. 

In what will be called the Dark Scope Room 
(DSR), the bias was set at VRI to give a just 
visible sweep-line. In this room, no ambient light- 
ing was permitted except the slight glow from the 
scope face and from the dimly illuminated opera- 
tional clock. 

In the Bright Scope Room (BSR), the bias was 
set 7 volts positive from VRI. This has been re- 
ported (1) as an optimum for CRTs of this type; 
it produces a relatively bright display. Room illu- 
mination was set to give a reading of 0.1 footcandle 
at the center of the PPI, as measured with a Mac- 
beth Illuminometer. Operators were not permitted 
to make any adjustments to the scopes. To control 
interdisplay differences, each room was operated for 
the same number of hours under each condition. 

Six Fighter Control Operators, four male and two 
female, were Ss. Each S worked for thirty minutes 
at a time, with a minimum of one hour between 
sessions. The Ss were required to report, in azimuth 
and range, the positions of all aircraft seen, using 
RCAF Standard Operating Procedures. All reports 
were recorded on magnetic tape. The recordings 
were later transcribed by the Zs, and plotted on 
scaled paper to yield aircraft “tracks” (ie., the suc 
cession of reported and plotted positions which 
could be identified with an aircraft and which de- 
scribed its course). . 

To randomize subject differences, each S was * 
paired with every other during the trial. The Ss 
were scheduled to arrive five minutes before their 
session began. They were briefed on the situation, 
so that no tracks were lost during the hand-over 
period. 


Results 


During the trial, 19.2% more plots were 
recorded in the BSR than in the DSR. This 
is a highly significant difference (p < .001). 

A more revealing analysis was based on air- 
craft tracks. A track was not considered 





52 E. G. Bessey and G. S. Machen 


usable unless it was at least 20 miles long. 
Loopings, multiple intersections, and other 
ambiguities also eliminated some data. Of 
the 70 tracks which remained, all were re- 
corded in the BSR. In the DSR, three of 
these were never recorded, ten were recorded 
as one plot only, and 18 extended for less 
than 20 miles. In terms of earliness of first 
detection, the BSR was superior on 50 tracks, 
the same on 19, and inferior on one; in dis- 
tance carried before the aircraft disappeared 
in the distance, the BSR was superior on 43 
tracks, the same on 26, and inferior on one; 
and in continuity of tracking it was superior 
on 13 tracks, the same on 57, and inferior on 
none. (Continuity here means the degree to 
which the plotted track appears as a Closely 
spaced succession of points, with few large 
gaps.) 

The regression of track length as recorded 
in the BSR (D,), on track length recorded in 
the DSR (D4), is: 


D, = 33.80 + 1.09Dz4. 


The equation indicates that the BSR condi- 
tions have an initial average advantage of 34 
miles and for each increase of 10 miles under 
the DSR conditions there is corresponding 
statistically significant increase of 10.9 miles 
in the BSR. 


Discussion and Conclusions 


The results of these trials have confirmed 
the findings of laboratory studies. Under 
laboratory optimum conditions, aircraft were 
detected sooner and were kept under radar 
observation longer and more continuously. 


Presumably little, if any, effect on target 
detectability can be attributed to the addi- 
tion of 0.1 footcandle of room lighting. It 
was introduced because some interaction with 
scope brightness has been demonstrated (1) 
and the following advantages accrue from its 
presence: the period of prewatch dark adapta- 
tion is reduced, controls on the displays are 
more easily identified, supervision is more ef- 
fective, auditory noise is less noticeable, main- 
tenance is simpler, service experience shows a 
reduction in the number of complaints of 
“eyestrain,” restrictions on the design of re- 
porting and other drill procedures are relaxed 
and, after some experience, operators express 
a preference for working in the lighted room. 
In addition, in areas where tasks requiring 
relatively high levels of illumination are car- 
ried on in conjunction with radar scope read- 
ing, the problem of shielding scopes to reduce 
the level to 0.1 footcandle is simple compared 
to the task of completely blacking them out. 


Received March 23, 1956. 


References 


. Smith, A. A., & Boyes, G. E. Visibility on radar 
screens: the effect of CRT bias and ambient 
illumination. J. appl. Psychol. 1957, 41, 15- 
18. 

Thornton, G. B. Radar range performance as a 
function of CRT operating conditions. DRML 
Report No. 163-3, Project No. D77-94—20-22 
(H.R. No. 109). 

. Williams, S. B., Bartlett, N. R., & King, E. Visi- 
bility on cathode ray tube screens: screen 
brightness. J. Psychol., 1948, 25, 455-466. 

Williams, S. B., & Hanes, R. M. Visibility on 
cathode-ray tube screens: intensity and color 
of ambient illumination. J. Psychol., 1949, 
27, 231-244. 





Journal of Applied Psychology 
Vol. 41, No. 1, 1957 


An Investigation of Several Methods of Teaching Contour 
Interpretation ** 


F. J. McGuigan 
Hollins College 


This study seeks to determine the relative 
effectiveness of s2veral methods of teaching 
map users to interpret contour lines. A set 
of contour lines is a symbol which indicates 
the shape of a terrain feature by the way in 
which the lines are curved; their number and 
the closeness with which they are spaced in- 
dicate the slope and elevation of terrain. The 
map user must learn how to visualize the 
shape of the ground, to determine direction 
of slope of terrain, and to understand and 
compute elevation, all on the basis of contour 
lines. Probably the best way to teach what 
such a symbol stands for is to have the 
learner associate it repeatedly with the corre- 
sponding terrain feature. But, in the present 
study, to increase the feasibility of such a 
learning situation, as well as possibly to in- 
crease the amount of learning, terrain fea- 
tures were represented in various ways inside 
a classroom. The techniques for representing 
terrain in this study were: (a) terrain board 
(a miniature replica of the terrain), (5) 
stereoscopic slides, and (c) two-dimensional 
slides showing typical views of the terrain. 
Furthermore, two ways of presenting contour 
lines were used: first, on standard two-dimen- 
sional maps, and second, on three-dimensional 
maps which present elevated and depressed 
surfaces in conjunction with the contour lines 
that symbolize them. 

In a typical learning situation of this sort 
where S must learn to visualize an object on 
the basis of a symbol, two questions of gen- 
eral interest concerning the most effective 
learning present themselves. First, should 


1 The research reported here was conducted by the 
author while he was employed by the George Wash- 
ington University, Human Resources Research Office, 
operating under contract to the Department of the 


Army. Opinions and conclusions are those of the 
writer and do not necessarily represent views of the 
University or the Department of the Army. 

2 The author is indebted to James W. Grubb for 
assisting in both the construction of the criteria and 
in the data collection. Also to James S. Calvin for 
some suggestions concerning the design. 


the representation of the object (e.g., a spe- 
cific terrain feature such as a hill) resemble 
very closely the object it stands for, i.e., be 
relatively concrete, or should the representa- 
tion be more distant from the actual object, 
i.e., be relatively abstract? The second ques- 
tion concerns the manner of presenting the 
symbol that is to be associated with the rep- 
resentation of the object. Should the symbol 
used during training resemble more closely 
the object that is represented (be relatively 
concrete) or should it be more like the sym- 
bol to be actually used (be relatively ab- 
stract)? To answer the first question, the 
ways of representing the terrain in this study 
were Classified along a concrete-abstract di- 
mension of representing the actual object, in 
ascending degrees of abstractness, as follows: 
terrain board, stereoscopic slides, two-dimen- 
sional slides. ‘To answer the second question, 
the two-dimensional presentation of the con- 
tour lines on a map was classified as relatively 
abstract, while the three-dimensional pres- 
entation on a map was Classified as relatively 
concrete. 


Method 


Two companies of basic trainees from 
Camp Chaffee, Arkansas, were used. The experi 
ment was conducted first on one company and re 
peated with the second company. In each company 
162 Ss were placed in nine matched groups, 18 Ss 
to a group, on the basis of Pattern Analysis scores 
of the Army Classification Battery. Previous un- 
published research indicates that Pattern Analysis 
scores are positively correlated with contour inter- 
pretation. None of the 324 Ss had previously re- 
ceived any Army map instruction 

Procedure. The specific methods of learning used 
were various combinations of the above techniques 
for representing land features and contour lines 
The general approach was to have S associate a 
representation of the land feature with its corre- 
sponding contour lines. More specifically, S was 
given a two-dimensional or a three-dimensional map 
with contour lines on it, and shown fifteen views of 
various land features on a terrain board or on one 
of the two types of slide. His task was to associate 


Subjects 








54 F. J. McGuigan 


the land feature with the corresponding contour rep- 
resentation of it as fast as possible. Numerous such 
associations were asked for. For example, the in- 
structor would point out a hill, discuss its slope and 
other characteristics, and then have the trainee look 
at the corresponding pattern of contour lines. The 
following combinations were the bases of the learn- 
ing methods: 


1. Terrain board and two-dimensional maps; 

. Terrain board and three-dimensional maps; 

. Stereoscopic slides and two-dimensional maps; 

. Stereoscopic slides and three-dimensional maps; 

. Two-dimensional slides and two-dimensional 
maps, 

. Two-dimensional slides and three-dimensional 
maps; 

. Composite (terrain board, two-dimensional 
slides, three-dimensional maps and two-dimen- 
sional maps). 


It can be seen that the first six methods form a 
3 X 2 factorial design. In addition, two control con- 
ditions (Methods 8 and 9) were used. These were 
a No-Training condition, and the Standard Army 
method. The No-Training condition consisted of in- 
struction in using the compass, no aspect of which 
was related to contour lines. The Standard Army 
method consisted of telling what a contour line is, 
what the kinds of slope are, how to compute eleva- 
tion, etc. 

Each of the nine matched groups was assigned at 
random to one of the above nine training methods. 
The period of instruction was 50 min. for all groups 
except the Composite group which required 100 min. 
All of the training in each company was given by 
the same instructor. The same instruction was given 
to all experimental groups, except that the manner 
of representing the terrain and the contours was 
varied. 

Criteria. Three tests were administered at the end 
of training, all based on the interpretation of con- 
tour lines from a two-dimensional map. The first 
was the Contour Interpretation Test, which con- 
sists of 10 map sections. The test includes three 
questions on each map section, one requiring the 
trainee to select the highest of five points, and two 
asking him how many times he would have to walk 
uphill if he moved along lines drawn on the map 
section. The other two tests were the Contour Visu- 
alization Tests I and I, pencil-and-paper tests which 
contain 98 questions in all. These questions ask S 
to interpret contour lines, to visualize the shape of 
land features, to determine elevation and direction 
of slope, and to understand the characteristics and 
definitions of contour lines. 


Results and Discussion 


Criteria. Intercorrelations between the three 
tests were computed as follows: each two 
tests were correlated separately for each ex- 
perimental group of each company (thus 


yielding 54 separate correlations). A chi- 
square test showed these correlations to be 
homogeneous; the correlations were then av- 
eraged, yielding the following values: be- 
tween the Contour Interpretation Test (CIT) 
and the Contour Visualization Test I (CVT- 
I), .70; between the Contour Interpretation 
Test and the Contour Visualization Test Il 
(CVT-I1), .60; between the Contour Visu- 
alization Tests I and II, .80. 

To determine the reliabilities of the cri- 
teria, odd-even correlations were computed 
by the Spearman-Brown formula for each 
group. The chi-square test of homogeneity 
showed significant variation between the sets 
of reliabilities of CVT-I and CVT-II, so that 
average reliabilities could not be computed. 
The reliability of the separate groups, how- 
ever, varied as follows: Contour Interpreta- 
tion Test, .70 to .94; Contour Visualization 
Test I, .72 to .99; and Contour Visualization 
Test II, .51 to .97. 

Methods of training. An F test among the 
nine training methods for each of the three 
criteria yielded values which were significant 
beyond the 1% level (see Table 1). To de- 
termine which methods were significantly su- 
perior, Duncan’s Range Test (2) was ap- 
plied. In comparing the No-Training method 
with the other methods, it was found that on 
the CIT all the groups which received train- 
ing were significantly superior to the No- 
Training group beyond the 1% level. On 
the CVT-I, all training methods except 
Method 4 were significantly superior to the 
No-Training method beyond the 5% level, 
and all except Methods 4 and 1 at the 1% 
level. On the CVT-II, all methods except 
Method 3 were significantly superior to the 
No-Training method beyond the 1% level. 
In general, then, it may be concluded that 
the No-Training condition led to relatively 
poor performance and that training of the 
sort given in the other eight methods is defi- 
nitely beneficial. 

A comparison of the mean proficiency of 
the groups can be made from Table 2. 
Method 6 (two-dimensional slides and three- 
dimensional maps) had the highest mean 
score on all three criteria, except the CVT-I, 
on which the Composite method surpassed it 





Methods of Teaching Contour Interpretation 


Table 1 


Analyses of Variance for All Experimental Conditions on the Three Criteria 


CIT 
Sum of F 
Squares df_ ratio 


Source of 
Variance 
Companies 
Methods 
Pattern Analysis 
MXPA 
MXC 
PAXC 
Error (CK MX PA) 


6.81* 
2.16" 
&.44*°* 
1.00 
1.88 
1.32 


43,264 1 
95,840 7 
911,144 17 
755,128 
83,456 7 
142,640 17 
755,920 119 


* Significant beyond the 5% level. 
** Significant beyond the 1% level. 


by .11 point. No other condition rivals 
Method 6 consistently on any of the three 
criteria. Since a major question in this ex- 
periment is whether the Standard Army 
method can be improved on, a comparison of 
Groups 6 and 9 is especially interesting. Al- 
though the two-dimensional-slides and three- 
dimensional-maps condition is not significantly 
superior to the Standard Army method on the 
CIT, it is significantly superior on the CVT-I 
and II beyond the 5% and the 1% levels re- 
spectively. By comparing both methods to 
the “zero” proficiency level indicated by the 
No-Training method, it was found that 
Method 6 led to 41%, 70%, and 55% 
greater proficiency on the three criteria, re- 
spectively. It is also interesting to note that 
Method 6 is at least as good as the Composite 
method (which allowed 100 min. of training), 
and possibly better. On the CVT-II Method 
6 is superior to the Composite method be- 
yond the 1% level. Since the above com- 
parisons were made after looking at the data 
(for the practical purpose of selecting the best 
method) it is desirable that these results be 
cross validated. 

We may now ask whether the two-dimen- 
sional-slides—three-dimensional-maps method 
is superior for all levels of pattern analysis 
aptitude. An F test of the methods levels 
interaction is insignificant. Therefore, the 
influence of pattern analysis upon contour 
proficiency appears uniform for all meth- 
ods of instruction, and we may conclude 
that the two-dimensional-slides—three-dimen- 


Sum of 
Squares df 


9,216 1 

437,840 7 

8,822,564 17 
2,943,100 
197,728 
298 836 
2,624,780 


CVT-I CVT-II 


Sum of P 
Squares df ratio 


4,096 1 62 
267 888 5.83°* 
1,244,444 7 ta 
692,340 89 
50,288 1.09 
73,844 66 
781,276 


sional-maps condition is a superior instruc- 
tional method regardless of the aptitude level 
of the S. A further F test shows that there 
are significant differences beyond the 1% level 
among men who have different Pattern Analy- 
sis aptitudes. From this finding it may be 
concluded that Ss with higher levels of this 
aptitude were significantly more proficient in 
interpreting contour lines than lower level 
personnel. It may be noted that the inter- 
class correlations between Pattern Analysis 


Table 2 


Mean Scores for the Nine Training Methods 
on the Three Criteria 


(Scores are pooled from both companies) 


Criteria 


Method CIT CVT-I CVT-II 
Terrain board 
two-dimensional maps 12.31 
Terrain board 
three-dimensional maps 11.72 
Stereoscopic slides 
two-dimensional maps 10.72 
Stereoscopic slides 
three-dimensional maps 12.14 
Two-dimensional slides 
two-dimensional maps 10.67 
. Two-dimensional slides 

12.53 
11.92 
911 
11.53 


three-dimensional maps 
Composite 
No-training 
Standard Army Method 








56 F. J. McGuigan 


scores and the CIT was .86; the CVT-I and 
II correlated .94 and .86, respectively, with 
Pattern Analysis. 

To determine whether the representation of 
the actual object should be abstract or con- 
crete, an F test among the three ways of rep- 
resenting the terrain conditions was made 
separately for each criterion, ie., to deter- 
mine whether there was a significant differ- 
ence in performance as a result of using the 
two-dimensional slides, the stereoscopic slides, 
or the terrain board. On the CIT the F test 
yielded a value that was not significant (F 

.28); on the CVT-I and II, however, the 
F was significant beyond the 5% level (F 

3.22) and the 1% level (F = 6.65), re- 
spectively. On these latter two criteria, then, 
it was necessary to determine where the sig- 
nificant differences were. Applying Duncan’s 
Range Test to answer this question, it was 
found that, on the CVT-I, the two-dimen- 
sional-slides condition was superior to the 
stereoscopic-slides condition beyond the 5% 
level; no other significant differences were 
found. On the CVT-II, the two-dimensional- 
slides condition was superior to the stereo- 
scopic-slides condition beyond the 1% level, 


but the terrain-board condition was superior 
to the stereoscopic-slides condition beyond the 


5% level. The stereoscopic-slides condition 
and the terrain-board condition did not differ 
significantly on the CVT-II. One might have 
hoped to find greater consistency in the find- 
ings, e.g., that the two-dimensional-slides con- 
dition was inferior to the stereoscopic-slides 
condition, which in turn was inferior to the 
terrain-board condition. In such a case, one 
might then conclude that the more concretely 
the actual object is represented, the greater 
the learning. However, with the above rather 
discrepant results, no definite conclusion is 
possible about whether abstract or concrete 
representation is best. One might, however, 
speculate as to why the three-dimensional 
slides led to relatively poor learning. One 
possibility is that the trouble required to put 
on polaroid glasses and to adjust to disparate 
images, with possible undesirable visual after- 
effects, biased this condition unfavorably. 
Perhaps a three-dimensional technique that 
did not have such variables confounded with 


depth perception would show a more favor- 
able result. 

If the three-dimensional-slides condition was 
prejudiced in this manner, then the lack of 
a significant difference between the terrain- 
board condition and the two-dimensional- 
slides condition is interesting. One possible 
interpretation would be in accord with a 
previous finding by Aikman, Lorge, Tuck- 
man, Spiegel, and Moss (1). They found 
that “the degree of remoteness from reality” 
has little or no effect oi: o-oblem-solving pro- 
ficiency. If we limit ovr comparison to the 
terrain-board and th two-dimensional-slide 
conditions, our conc) .sion would suggest that 
the degree of com cteness of the representa- 
tion of the actual object makes little differ- 
ence as far as learning is concerned. If our 
“degree of concreteness” can be equated to 
the “degree of remoteness” of Aikman et al., 
our results would seem to confirm theirs. 

We have above been concerned with the 
question as to whether or not the representa- 
tion of the actual object should be concrete 
or abstract. We must now turn to the ques- 
tion of whether the symbol that stands for 
the object should be concrete or abstract. 
To answer this question, F tests were run be- 
tween the two conditions for representing the 
symbol, i.e., between the three-dimensional- 
map and the two-dimensional-map conditions 
separately on the three criteria. If, for in- 
stance, it were found that the three-dimen- 
sional-map conditions were superior on all 
three criteria, we could conclude that it is 
better to present the learner with relatively 
concrete symbols. The Fs, however, were not 
significant on the first two criteria, but sig- 
nificant beyond the 1% level (F = 8.62) on 
the CVT-II.* We are thus faced with the 
situation where two criteria suggest that it 
does not seem to make any difference whether 
the symbol is abstract or concrete; but the 
third criterion, on the other hand, suggests 
that, because the three-dimensional-map con- 
dition is superior, the concrete symbol leads 
to greater learning. This is a particularly 
amazing result in view of the high intercor- 
relations among the three criteria. Though 


8 The interaction between the manner of represent 
ing the object and the manner of representing the 
symbol in the 3 ¥ 2 factorial design was insignificant. 





Methods of Teaching Contour Interpretation 57 


these findings do not present a very definite 
picture, it may tentatively be concluded that 
learning to use an abstract symbol will be 
facilitated by starting with one that is more 
concrete than it is. 


Summary 


This study seeks to determine: (a) the 
relative effectiveness of several methods of 
teaching map users to interpret contour lines, 
(b) whether, for most effective learning, a 
learner should be presented with a concrete 
or abstract representation of an object that 
he must learn to visualize through the use of 
a symbol, and (c) whether the symbol which 
he uses to visualize the object should be ab- 
stract or concrete. 

Two companies of 162 Ss each were trained 
by the use of various combinations of con- 
crete or abstract representations of terrain 
(a terrain board, two-dimensional slides and 
three-dimensional slides of the terrain), and 
concrete or abstract symbols (contour lines on 
two-dimensional or three-dimensional maps). 
Learning resulting from these methods was 
compared to the Standard Army method, and 
a No-Training (control) condition. 

The results showed that the training method 
involving representation of terrain by two-di- 
mensional slides, and presenting the symbol 


on a three-dimensional map, generally led to 
highest proficiency. In particular it is sig- 
nificantly superior to the Standard Army 
method. No definite conclusion was possible 
regarding question 6 above, although one pos- 
sible interpretation is that variation of the 
representation of terrain along an abstract- 
concrete dimension does not affect learning. 
The data also suggest that the symbol should 
be of a relatively concrete nature. 

In addition, it was found that Pattern Analy- 
sis scores of the Army Classification Battery 
were highly related to contour interpretation 
proficiency (r = .86 to .94), and that Ss with 
high Pattern Analysis scores learn contour in- 
terpretation significantly better than lower 
aptitude Ss. 

Received May 14, 1956. 


References 


1. Aikman, L., Lorge, I, Tuckman, J., Spiegel, J., 
& Moss, Gilda. Differences in the quality of 
the solution to a practical field problem at 
various degrees of remoteness from reality 
Amer. Psychologist, 1954, 8, 311. (Abstract) 

2. Duncan, D. B. Statistical inference problems con 
cerning differences between ranked treatments 
in an analysis of variance. Multiple Range 
and Multiple F Tests. Blacksburg, Va., Dept 
of Statistics and Statistical Laboratory, Vir- 
ginia Agricultural Experiment Station, June, 
1954. (Tech. Rep. No. 9.) 








Journal 


qf Aopnet Psychology 
Vol. 41, No. 1, 1957 


Seniority and Criterion Measures of Job Proficiency 


Rutledge Jay and James Copes 
Detroit, Michigan 


Research workers engaged in the task of 
validating tests or selection factors based 
upon interview data for the purpose of pre- 
dicting occupational success need to know the 
components of variance in the criterion meas- 
ures commonly used. 

Ghiselli and Brown (1) have suggested that 
extraneous factors influencing measures of 
job success may introduce errors in research. 
Jay and Copes (2) have shown that the num- 
ber of years of schooling completed is one 
of the more important factors of this kind. 
They found that the magnitude of the cor- 
relation between criterion measures of job 
proficiency and the amount of formal school- 
ing completed varied with the type of cri- 
terion measure and increased with the skill 
level of the occupation of the workers rated. 

The purpose of this report is to analyze 
and discuss data relevant to the following 
questions: Is seniority an extraneous factor 
influencing measures of job success? Does 
the type of measure of job success influence 
the relation between seniority and measures 
of job success? Does the skill level of the 
occupation sampled influence this relation? 
Does the magnitude of these influences have 
any practical significance? 


The Type of Measure of Job Success 


Measures of proficiency on the job may be 
classified in a variety of ways. Some are 
relatively more objective, such as units of 
production per week or per hour, units of pro- 
duction relative to the average of the produc- 
tion group, versatility measured in terms of 
the number of related operations which the 
worker can perform up to production stand- 
ards, or quality of work measured in terms of 
the number of pieces scrapped or requiring 
reworking. 

Other measures of work proficiency are 
relatively more subjective, such as rank-order 
supervisory ratings, ratings using normalized 
ranks, broad-category supervisory ratings us- 


58 


ing descriptive verbal categories, specific in 
contrast to global ratings, ratings by one or 
more supervisors, ratings based on paired 
comparisons, or ratings related to some gen- 
eral subjective standard. 

The type of measure of job success may 
influence the relation between seniority, de- 
fined as the number of months on the job or 
with the company, and measured job profi- 
ciency. Since the available data include 
studies using a variety of types of measures 
of job success, this hypothesis can be tested. 


The Skill Level of the Occupation 


Jobs which are systematized by technologi- 
cal procedures may eliminate individual dif- 
ferences to a considerable extent; or labor un- 
ions and supervisors may recognize individual 
differences in skilled jobs while denying them 
in less skilled occupations. 

Since the available data include studies of 
occupations which are classified as to skill 
level, this hypothesis can be tested. 


Method 


The Bureau of Employment Security, United States 
Department of Labor, summarizes the results of test 
research conducted by the various State Employ- 
ment Services in the form of technical reports. 
These technical reports include the correlations be- 
tween various types of measures of job proficiency 
and the number of months of experience on the job 
or with the company. These data are also sum- 
marized in Guides (4) which are revised from time 
to time. All of the data analyzed below are from 
these sources. 

The decision to use data gathered by the Bureau 
of Employment Security was based upon the fact 
that such data are particularly well suited to the 
purposes of this investigation. Almost nowhere else 
has such a large amount of occupational research 
been planned, supervised, and reported in a uniform 
and standardized manner. The investigator is not 
troubled by a variety of reporting procedures and 
missing or noncomparable data. The variety of oc- 
cupations studied is particularly great. The research 
has been conducted in many parts of the United 
States and represents an excellent geographical cover- 
age. The research has been conducted by many 
different investigators. The studies include an ex- 











Seniority and Criterion Measures of Job Proficiency 


Table 1 


Average Correlations Between Seniority and Types of 
Criterion Measures of Job Proficiency 





Number 
of 
Studies N t OF 


Type of Criterion 


Broad-Category Supervisory 

Ratings 17 
Weekly or Hourly Units of 

Production 454 
Rank-Order Supervisory 

Ratings 5» 21 23 
School Grades or Proficiency 

Tests 35 Al 18 


1,028 .13 .03 


20 05 


2,462 .17 .02 


All Types of Criteria 





tensive number of cross validations. Finally, the 
conclusions which may be drawn from these data 
support a greater degree of generalization than would 
be appropriate otherwise. 

The correlations between seniority and criterion 
measures of job proficiency were transformed to z 
equivalents and the standard errors of the average 


Table 2 


Average Correlations Between Seniority and Criterion 
Measures of Job Proficiency by Skill Level 
of Occupations Rated 





Skill Level Studies N 


Unskilled 5 258 —.05 
Semiskilled 14 1,015 17 
Skilled 13 642 27 
Professional, Semiprofes- 

sional and Technical 6 210 .25 


All Skill Levels 38 


2,125 19 


zs were computed and transformed back 
O-,s through the use of tables (3). 


‘ 


Results 


Table 1 shows the type of criterion, the 
number of studies included in each type, the 
total number of workers rated within each 
type, the average correlation between sen- 


Table 3 


Average Correlations Between Seniority and Types of Criterion Measures of Job Proficiency 
Grouped by Skill Level of the Occupations Rated 


Skill Level 


nskilled Occupations 
Rank-Order Supervisory Ratings 
Weekly or Hourly Units of Production 
Broad-Category Supervisory Ratings 
School Grades or Proficiency Tests 
Semiskilled Occupations 
Rank-Order Supervisory Ratings 
Weekly or Hourly Units of Production 
Broad-Category Supervisory Ratings 
School Grades or Proficiency Tests 
Skilled Occupations 
Rank-Order Supervisory Ratings 
Weekly or Hourly Units of Production 
Broad-Category Supervisory Ratings 
School Grades or Proficiency Tests 


Number of 
Studies 


1 
2 
2 
No data 


No data 


Professional, Semiprofessional and Technical Occupations 


Rank-Order Supervisory Ratings 
Weekly or Hourly Units of Production 
Broad-Category Supervisory Ratings 
School Grades or Proficiency Tests 


6 
No data 
No data 
No data 





60 Rutledge Jay and James Copes 


iority and each type of criterion measure of 
work proficiency, and the standard error of 
the average correlation. 

On the basis of 47 studies involving meas- 
urements of the proficiency of 2,462 work- 
ers, the number of months on the job or with 
the company has a small positive average 
correlation with various measures of job 
proficiency. This average correlation is sta- 
tistically significant. However, the influence 
of seniority upon the measurement of job 
proficiency varies with the type of criterion 
measure. 

Table 2 shows the skill level of the occu- 
pation rated, the number of studies, the total 
number of workers rated, the average corre- 
lation, and the standard error of the average 
correlation. The average correlation between 
seniority and criterion measures of job profi- 
ciency increases with the skill level of the oc- 
cupation of the workers rated. 

Table 3 shows the type of criterion, the 
number of studies, the total number of work- 
ers rated, the average correlation between 
seniority and criterion measures of work 
proficiency, and the standard error of the 
average correlation. The occupations in each 
skill level grouping are classified by type of 
criterion rating. 

For unskilled occupations none of the types 
of measures of job proficiency appears to in- 
fluence the correlation between seniority and 
job proficiency. None of the average corre- 
lations are statistically significant. 

For semiskilled occupations all of the cor- 
relations are statistically significant. The re- 
searcher who might wish to minimize the in- 
fluence of experience or seniority upon the 
criterion measure of job proficiency for semi- 
skilled occupations might choose either rank- 
order or broad-category ratings in preference 
to weekly or hourly units of production. 

For skilled occupations all of the criterion 
measures of job proficiency are statistically 
significant. The research worker desiring to 
control the influence of seniority or experi- 
ence upon his criterion measure for these oc- 
cupations would also choose either the rank- 
order or broad-category method in preference 
to any other for which data are available. 


Conclusion 


Research workers engaged in the task of 
validating tests or selection factors based 
upon interview data for the purpose of pre- 
dicting occupational success need to know the 
components of variance in the criterion meas- 
ures of job proficiency commonly used. 

Seniority is an extraneous factor influenc- 
ing measures of job proficiency. On the basis 
of an analysis of 47 studies involving meas- 
ures of the job proficiency of 2,462 workers 
employed in 39 different occupations, the av- 
erage correlation between seniority and job 
proficiency is .17 with a standard error of 
correlation of .02. 

The type of measure of job success influ- 
ences the relation between seniority and job 
success. Rank-order and broad-category su- 
pervisory ratings are least influenced by sen- 
iority when the skill level of the occupations 
of the workers rated is taken into account. 

The skill level of the occupations sampled 
is an extraneous factor influencing measures 
of job success. The influence of seniority in- 
creases with the skill level of the occupations 
sampled. 

The magnitude of these influences does not 
have any practical significance for unskilled 
jobs; it may be significant when semiskilled 
jobs are being studied; but with skilled occu- 
pations the influence appears to be large rela- 
tive to the magnitude of validity coefficients 
commonly obtained and should be taken into 
account and controlled whenever possible. 


Received .April 2, 1956. 


References 


Ghiselli, E. E.. & Brown, C. W. Personnel and 
industrial psychology. New York: McGraw- 
Hill, 1948 

Jay, R. L., & Copes, J. Education and criterion 
measures of job proficiency. Detroit: unpub- 
lished report, 1956. 

. McNemar, Quinn. Psychological statistics 
York: Wiley, 1949 

. U. S. Department of Labor, Bureau of Employ- 
ment Security. Guide to the use of general 
aptitude test battery (Section Ill: develop- 
ment). Washington, D. C.: undated, revised 
periodically. 


New 





Journal of Applied Psychology 
Vol. 41, No. 1, 1957 


The Relationship Between Grades and a Predictive Test 
Battery in the School of Pharmacy of The George 
Washington University * 


Suzanne D. Hill 


The George Washington University 


The purpose of this study was to investi- 
gate the effectiveness of a battery of predic- 
tive tests used for selection by the School of 
Pharmacy of The George Washington Uni- 
versity. 

Procedure 


The sample. Beginning in the fall of 1949, stu- 
dents applying for admission to the School of Phar- 
macy were given a battery of four tests. In accord- 
ance with established university policy, students in 
the upper fifth of their high school graduating class 
were not required to take this battery of tests. Dur- 
ing the four years 1949-53, 116 students were tested. 
These 116 cases made up 63% of the total number 
of applicants for these four years, the remaining 37% 
being from the upper fifth of their graduating classes 
Of the 116 students, 30 did not complete registration 
and 12 did not take the complete battery. There- 
fore, a total of 74 students, 40% of the original ap- 
plicants, took the battery and completed their regis- 
tration. 

The predictive tests. The four predictive tests 
used were the American Council of Education Psy- 
chological Examination for College Freshmen (1949- 
53 editions), the Ohio State Psychological Examina- 
tion, Part 3, the Purdue Mathematics Training Test, 
Form XM, and the Iowa Chemistry Test. 

The criterion. During the first two years of train- 
ing, the School of Pharmacy prescribes 46 hours of 
courses in science and professional subjects which 
must be successfully completed by all students. The 
quality-point index of grades earned on these courses 
was the criterion in this study. 

The statistical methods used. All zero order cor- 
relations were computed according to the Pearson 
product-moment technique. The multiple correla- 
tions for all combinations of the four predictive tests 
with the criterion were computed according to the 
Wherry-Doolittle method. The significance of R 
was checked by the analysis of variance technique 
recommended by McNemar (1). Correction for 
shrinkage was applied according to McNemar (1). 

Once the best combination of tests had been found 
by multiple regression methods, the distribution of 


1 This study was conducted at The George Wash- 
ington University in partial fulfillment of the re- 
quirements for the M.A. degree. Dr. Thelma Hunt 
deserves much credit and my deep appreciation for 
her assistance and advice, and my thanks go to 
C. W. Bliven, Dean of she School of Pharmacy, for 
his consideration and help with this problem. 


test scores was divided into lower and upper halves 
and the percentages of passes and failures in these 
two categories noted. The average percentile score 
for the four predictive tests was computed for each 
S. The median score for the sample was computed 
for the complete test battery and for the battery 
minus the Purdue Mathematical Test. The scores 
were then divided into lower and upper halves for 
each battery at this median score. The criterion was 
divided into two categories using those established 
by The George Washington University for academic 
suspension or honors. The chi-square technique was 
applied to the differences between the upper and 
lower halves of the distribution 


Results 


Table 1 gives the intercorrelations of the 
four predictive tests with each other, the cor- 
relations of each predictive test with the cri- 
terion, and the multiple correlation for all 
the combinations of the four predictive tests 
with the criterion plus corrections for shrink- 
age. The largest r between predictive tests 
and criterion was .51 between the Ohio State, 
Part 3, and the criterion. When all four tests 
are used together the predictive power is in- 
creased from .51'to .61. When only three of 
the tests are used together as a battery, the 
R coefficient remains at .61. It appears that 
the Purdue Mathematics Test can be elimi- 
nated from the test battery without loss in 
power. 

Table 2 gives the comparison of perform- 
ance between the upper and lower halves of 
the test-score distributions for the test bat- 
tery made up of all four predictive tests and 
for the test battery made up of three pre- 
dictive tests, the Purdue Mathematics Test 
omitted. With all four tests, 59% of the 
students whose scores fell in the upper half 
of the test distribution achieved a quality- 
point average which remained above 2.50 for 
the entire two years while only 22% of the 
students whose scores fell in the lower half 
of the test distribution fell in this category. 
This difference might be expected by chance 





Suzanne D. Hill 


Table 1 


Intercorrelations, Correlations with Criterion, Multiple Correlations with Criterion, 
and Corrected Correlations for the Predictive Tests 


(N = 74) 





. Ohio State No. 3 
ACE 

. Iowa Chemistry 
. Purdue Math. 
.1,2,3&4 
.1,2&3 

1&3 
.1,3&4 
-1,2&4 
-1&2 

2&3 
.2,3&4 
3.3&4 


SNAKE SN! 


* Significant at 1% level. 


alone not more than one time in 100. Of the 
students whose grades fell below 1.50, 16% 
were from the upper half of the test scores 
while 38% were in the lower half. This dif- 
ference is significant with a probability of .04. 

When the same analysis was computed for 
the test battery consisting of the Ohio State, 
Part 3, the ACE, and the Iowa Chemistry 
tests, the results were approximately the same 
as those for the battery made up of all four 


Table 2 


Comparison of Performance Between Upper and Lower 
Halves of Test Battery (a) of Four Predictive 
Tests and (6) of Three Predictive Tests 


Honors 
Above 


2.50 


Failed 
Less Than 
1.50 


Group 


a. All Four Predictive Tests 
Upper half 16% 
Lower half 38% 22% 
x? 4.38 7.15 
p 04 01 


y 
59% 


b. Predictive Tests 1, 2 & 3 
Upper half 14% 
Lower half 41% 
x? 6.80 
p 


57% 
24% 
8.10 


McNemar 
Criterion Corrections 





Pg 
A9* 
A8* 
aa 
.61* 
.61* 
0" 
.59* 


RY ng 
.56* 


5S* 


predictive tests. Here, the discrimination for 
those students whose index will fall below 
1.50 is better than in the battery made up of 
all four tests. 

In general, then, the two combinations of 
tests are effective in discriminating between 
those students who might be expected to fail 
and those students who might be expected to 
maintain an average above 2.50 during the 
first two years of pharmacy school training. 


Summary and Conclusions 


A predictive battery of four tests has been 
studied in relation to success in the School of 
Pharmacy of The George Washington Univer- 
sity. The sample included all students taking 
the entire battery of tests during the years 
1949-53, a total of 74 cases. The following 
conclusions may be drawn: 

1. The battery of four tests showed a sub- 
stantial validity with grades in the pharmacy 
school. 

2. Elimination of the Purdue Mathematics 
Training Test from the battery does not de- 
crease the effectiveness of the battery. 


Received May 21, 1956. 


Reference 


1. McNemar, Q. Sampling in psychological re- 
search. Psychol. Bull., 1940, 37, 331-365. 





Journal of Applied Psychology 
Vol. 41, No. 1, 1957 


A Comparison of the Academic Aptitude of University 
Extension Degree Students and Campus Students 


Hollis B. Farnum 


University of Rhode Island, Division of University Extension 


Each year a larger percentage of the gen- 
eral population is enrolling in adu!t education 
classes. In 1955, 35 million adults in the 
United States were enrolled in such courses 
(2). These courses range from practical 
classes in arts and crafts conducted by local 
high schools and other community organiza- 
tions to advanced degree programs conducted 
by many of the outstanding colleges and uni- 
versities in the country. Extension educators 
have noted a marked trend in recent years 
toward the selection of courses which carry 
credit toward some specific goal such as a 
college degree or certification in some spe- 
cialty area such as real estate, insurance, in- 
dustrial management, and engineering. 

There are several factors which have ini- 
tiated this trend and which may cause it to 
continue to expand and spread throughout 
the educational hierarchy. The first factor is 
the greater weight being piaced on education 
by the general public. Second is industry’s 
emphasis on higher educational requirements 
for both initial selection and promotion of 
employees. Related to this is the necessity 
for increased technical training brought about 
by the complexity of recent technological de- 
velopments in industry. A third factor re- 
lated to the increasing interest in adult 
education is the increasing age level of the 
general population. Many retired individuals 
or individuals approaching retirement are en- 
rolling in courses either to train themselves 
for a new field of work or for cultural and 
avocational purposes. A fourth factor which 
will lead to the expansion of college and uni- 
versity adult education classes is the ex- 
pected tidal wave of students applying for 
admission to college starting in 1958. It is 
expected that college enrollments will in- 
crease from a current enrollment figure of 
2,629,293 students to 3,338,656 by 1960 and 
4,382,082 by 1965 (5). It is a generally ac- 
cepted fact among college administrators that 


due to limited facilities many students who 
are capable of doing college level work will be 
unable to gain admission. It is to be ex- 
pected that many of these students will enroll 
in college or university extension programs. 
Some institutions will set up formal degree 
programs for these students in extension cen- 
ters where they may take one or two years 
of work toward their bachelor’s degree, while 
others may not make formal programs avail- 
able but the students may take courses which 
will carry credit toward a degree when they 
do matriculate in a full-time status. 

The last factor raises the question of the 
quality and level of college courses taken 
through university extension programs. In 
the past there has been a more or less covert 
stigma attached to extension work. Some 
colleges and universities will not grant credit 
for courses taken in extension. Many. insti- 
tutions limit the number of graduate credits 
which may be taken through extension. Ani- 
keeff (1) in a recent study reported in this 
Journal questioned the advisability of grant- 
ing credit for any courses taken in extension 
work at the undergraduate level. To date 
there has been very little research done to 
clarify the situation. 


Procedure 


In 1955 the Extension Division of the University 
of Rhode Island initiated a new program leading to 
the Bachelor's Degree in Business Administration. 
Students entering the program were interviewed by 
the Director of Admissions of the University and 
given the ACE Psychological Examination and The 
Cooperative Reading Comprehension Test. Sixty- 
nine students were enrolled in the initial courses of- 
fered under the new program. All applicants were 
accepted, but of this group ten were admitted in a 
nonmatriculated status pending their demonstrating 
the ability to handle university level work. The: 
average scores of the nonmatriculated students in 
all of the entrance tests were below the averages for 
the total group as would be expected since this was 
one of the criteria used in differentiating between 
the matriculated and nonmatriculated groups. Up 








Hollis B. Farnum 


Table 1 


Comparison of Scores of Campus and Extension Students on the ACE Psychologica] Examination 


and the Cooperative Reading Comprehension Tests 





ACE Scores 
T Score 
Statistic 


Campus Exten. 


N 119 67 82 
M 112.9 107.3 54.0 
o 19.7 22.0 7.0 


3.2 1.5 


1.75 


* Significant at 1% level of confidence. 
** Significant at 5% level of confidence 


to the time of the writing of this paper the ten 
nonmatriculated students had maintained a flat C 
average. The mean age of these extension degree 
students was 29.8 years. Fifty-seven of the students 
were male and eleven were female.!. The group was 
composed of 17 veterans and 51 nonveterans. 

As would be expected, the educational background 
of the extension group was quite heterogeneous. 
Nineteen had no educational experience beyond high 
school, whereas 49 had advanced training of some 
type. The experience of the latter group broke down 
as follows: fourteen had attended college on a full- 
time basis, the average time of attendance being two 
years; 27 had taken university extension courses, the 
average number of courses taken per student was 
four; 18 had attended special or technical schools.* 

The ACE and the Cooperative Reading Compre- 
hension test scores of the extension degree students 
were compared with scores on the same tests achieved 
by a group of full-time campus students majoring in 
Business Administration. The campus group was 
composed of 120 students—the total freshman class 
entering the College of Business Administration of 
the University of Rhode Island in 1955. The mean 
age of the campus students was 19.6 years. Ninety- 
seven were males and 23 were females. The group 
was composed of 86 nonveterans and 34 veterans. 
Eleven were admitted with advanced standing. 

Since the usefulness of the Q (quantitative) and 
L (linguistic) scores on the ACE test has been seri- 
ously questioned in the literature for selection or 
guidance purposes, only the T (total) score was used 
for comparative purposes on the ACE test. On the 
Cooperative Reading Comprehension Test and V 
(vocabulary), S (speed of comprehension) and L 
(level of comprehension) scores were used. Guil- 

1 It will be noted that although 69 students started 
the program, complete background information was 
available on only 68. 

2 Total greater than 100% since some students fell 
in more than one of these educational categories. 


Cooperative Reading Comprehension Scores 





V Score 


Campus 


S Score L Score 


Campus Exten. 


Exten. Campus Exten. 


66 82 66 82 66 
63.4 57.9 57.8 56.4 58.8 
10.6 8.8 8.7 6.0 7.3 


1.45 1.11 


2.16” 


ford’s (3) formula for the computation of the stand- 
ard error of the difference between means was used. 


Results 


Referring to Table 1 the N for a particular 
test score does not always correspond with 
the N for the total group, since test scores 
for all students were not available. 

A difference significant at the 1% level of 
confidence was found between the two groups 
on the vocabulary test score with the exten- 
sion students attaining the higher scores. 

The difference between the two groups in 
speed of comprehension was very small and 
not found to be significant. 

The ¢ test of significance showed a differ- 
ence between the two groups on the level of 
comprehension test significant at the 5% 
level of confidence; again the higher scores 
were attained by the extension students. 

Although the difference between the mean 
total scores on the ACE test appeared rela- 
tively large by inspection, it was not found 
to be significant. The mean ACE score for 
both campus and extension students is higher 
than the national mean of 106.8 for four-year 
colleges. 

As an additional check on the existence of 
significant differences between the perform- 
ance of campus and extension students, the 
chi-square test was applied to the same data. 
This was done especially to check the differ- 
ence between the two groups on the total 
score of the ACE test. In all cases the chi- 





Comparison of Aptitude of Extension and Campus Students 65 


square test agreed with results reported in 
Table 1, with the exception of the L score. 
The chi-square test did not show a significant 
difference between the two groups on the level 
of comprehension test score, whereas the ¢ 
value obtained was significant at the 5° 
level of confidence. 


Discussion 


In the past it has been assumed by many 
educators that the extension student is su- 
perior to the campus student in terms of mo- 
tivation to learn; but that the campus stu- 
dent is superior to the extension student in 
natural ability or aptitude for college work. 

However, the results reported above sug- 
gest that those extension students working 
toward a college degree have as much apti- 
tude for college work as campus students and, 
in some specialized areas, may have some ad- 
vantage over campus students. Hence, tak- 
ing into consideration the level of ability of 
extension students plus their strong motiva- 
tion to learn, it follows that they should be 
able to do college level work, and they should 
be entitled to receive college credit for that 
work. Of course it should be emphasized 
that this paper has considered only those stu- 
dents working for a college degree through a 
university extension program, and no infer- 
ences may be drawn from it for extension 
students falling into other educational cate- 
gories. 

In view of the ever-increasing number of 
adults enrolling in extension courses at vari- 
ous educational levels it would appear that 
more research is needed in this area to deter- 
mine the type of educational credit appropri- 
ate to the different programs which are pres- 
ently evolving. 


Summary 
The ACE Psychological Examination and 
the Cooperative Reading Comprehension Ex- 
amination were given to 68 University of 
Rhode Island extension students working for 
a degree in business administration and 119 


University of Rhode Island college freshmen 
enrolled in business administration on the 
campus at Kingston. 

No significant difference was found between 
the mean scores of the two groups for the 
total score on the ACE test. It was found 
that the mean ACE score for both the campu: 
and extension groups was higher than the na 
tional mean of 106.8 for four-year colleges 
A significant difference, at the 1% level of 
confidence, was found between the vocabulary 
scores of the extension students and the cam- 
pus students. Extension students attained 
the higher scores 

No significant difference was found be- 
tween the speed of comprehension scores for 
extension and campus students. 

On the level of comprehension test a dif 
ference was found which was significant at 
the 5% level of confidence. Again extension 
students attained the higher scores. 

Since any differences which did exist be 
tween the two groups on these tests seemed 
to be in favor of the extension students, it 
might be assumed that extension students 
who were enrolled in a program leading to a 
college degree were capable of doing college 
level work and hence entitled to credit for 
that work. It was pointed out that these 
conclusions are contrary to the attitude held 
by some educators who hold that college 
credit should not be granted for work taken 
in an extension program 
Received May 4, 1956 


References 


1. Anikeeff, A. M. Scholastic achievement of exten 
sion and regular college students) J. appl 
Psychol., 1954, 38, 171-173 

2. David, L 35,000,000 grownups are back in 


school. This Week Magazine, September 11 
1955, 38-42 

3. Guilford, J. P. Fundamental statistics in psychol 
ogy and education. New York: McGraw 
Hill, 1942 

4. McNemar,Q. Psychological statistics. New York 


Wiley, 1949 
5. Thompson, R. B 
students 


The impending tidal wave of 
Ohio: Amer. Ass. Col 
legiate Registrars Admissions Officers, 1954 


Columbus, 








Journal of Applied Psychology 
Vol. 41, No. 1, 1957 


Rater Reliability and ““Judgmental Demoralization”’ 


A. W. Bendig 


University of Pittsburgh 


Cummings (4) has suggested that the reli- 
ability and validity of ratings may be re- 
duced by the phenomenon that he calls 
“judgmental demoralization and fatigue.” A 
previous paper (3) has pointed out that two 
different effects are implied by his discussion 
and deduced operational characteristics of the 
phenomena. Judgmental demoralization §re- 
sults when the total rating task presented to 
the judge is too extensive or complex. This 
suggested phenomenon should be a monotoni- 
cally increasing function of (a) the size or 
complexity of the individual stimuli presented 
to the judge for rating, and (b) the total 
number of stimuli that the S is aware he is 
required to rate. The demoralization phe- 
nomenon can be assessed by comparing the 
reliability of the first sets of ratings from 
groups of Ss who are presented with different 
total numbers of stimuli to be rated. For ex- 
ample, the reliability of the first ten ratings 
for Ss who are presented with a total of 60 
stimuli should be lower than the reliability of 
the first ten ratings of Ss receiving a total of 
30 stimuli because of demoralization effects. 

A previous study (3) investigated the pos 
sible effect of judgmental fatigue on the reli 
ability of preference ratings of common foods. 
Judgmental demoralization was held constant 
by presenting the same number of foods to be 
rated to each S and the total list of 45 stimuli 
divided into sublists differing in the ordinal 
position of the foods in the series of judg- 
ments made by the S. Measures of rater re- 
liability (average intercorrelation among the 
Ss in rating single foods) and rater bias (cor- 
relation coefficient representing the individual 
Ss’ tendency to rate single foods high or low) 
were computed. No relationship was found 
between rater reliability or rater bias and the 
ordinal position of the sublist in the series of 
ratings, i.e., the last foods were rated just as 
reliably and with no more bias than were the 
first foods. Judgmental fatigue did not influ- 
ence the reliability or bias of food-preference 


ratings when the number of ratings per S was 
45. However, this study did not answer the 
question whether fatigue effects upon reli- 
ability would appear if greater or lesser num- 
bers of judgments were required of the S, nor 
did it attempt to investigate the effect of de- 
moralization or the possible interaction be- 
tween fatigue and demoralization. 


Procedure 


Stimuli. A list of the names of 90 common foods 
had been previously assembled (3). The food names 
were numbered and six lists, varying in the total 
number of goods contained in the list, were selected 
by drawing from a table of random numbers. Two 
of these lists included 30 foods, two contained 45 
foods, while the last two lists were of 60 food names 
Although there was considerable overlap between the 
six lists in the foods included, none of the foods oc 
curred twice on the same list 

Scale. A 9-category food-preference rating scale 
was constructed using five verbal anchors. Anchors 
A, C, E, G, and I from a previous study (2, p. 37) 
were used to define the first, third, fifth, seventh, and 
ninth categories on the scale with the remaining cate 
gories left unanchored. This scale is similar to that 
used in a previous study (3). The scale, the rater 
instructions, and the food lists to be rated were 
mimeographed on single sheets for distribution to 
the Ss 

Subjects. A total of 120 undergraduate students 
enrolled in psychology classes served as raters and 
were randomly divided into six groups of 20 Ss each 
Each of the Ss received one of the six lists, contain 
ing either 30, 45, or 60 foods, and rated the stimuli 
as to how well the S liked each food. The ratings 
were collected during a class period and the assign 
ment of the different lists to the Ss was random 


Results 


The ratings of two sublists containing ten 
foods each were selected for analysis from the 
total number of stimuli rated by each of the 
six groups of Ss. The first to the tenth foods 
on each list comprised the first sublist, while 
the foods occurring in the 21st to 30th ordinal 
positions on the total lists were included in 
the second sublist. As noted above, judg- 
mental demoralization would be expected to 
encourage intergroup differences in rater reli- 





Rater Reliability and “Judgmental Demoralization” 


Table 1 


Analyses of Variance of Transformed Rater Reliability and Bia 


Source of Variation 


Total 

Between Groups 
Number of Stimuli 
GRPS within NS 

Within Groups 
Position of Stimuli 
Interaction: NS &K PS 
Pooled Interactions 

GRPS within NS * PS 


ability on both the first and second food sub 
lists, while judgmental fatigue should affect 
intragroup differences in reliability between 
the two sublists occupying different ordinal 
positions among the total series of rating 
judgments. 

Each of the twelve matrices of ratings (six 
rater groups and two sublists of foods) was 
analyzed by two-criteria analysis of variance 
procedures and mean squares for foods, raters, 
and error (interaction of foods and raters) 
obtained. These mean squares were con 
verted into measures of rater reliability and 
rater bias by the formulas previously used 
(3) and as described by Ebel (5). All of 
the twelve foods’ mean squares and their de- 
rived reliability coefficients were significant at 
the .01 level, indicating that the groups of 
Ss were able to discriminate preference dif- 
ferences between the foods on each sublist, 
while 9 of the 12 raters’ means squares and 
their derived bias coefficients were significant 
at the .O1 level and more at the .05 
level.' These reliability and bias measures 
were converted into an approximately nor- 
mally distributed variate by the 
r-to-z transformation (7, p. 219) 


one 


common 
and two 


1A table containing the rater reliability and bias 
coefficients for each of the six rater groups has been 
deposited with the American Documentation Insti 
tute. Order Document No. 5082 from the ADI Aux 
iliary Publications Project, Photoduplication Service, 
Library of Congress, Washington 25, D. C., remit 
ting in advance $1.25 for microfilm or $1.25 for 
photocopies. Make checks payable to Chief, Photo 
duplication Service, Library of Congress 


Coethcet 


Rater Reliability 


Mean 


ypuare 


004334 
OOLSRS 


OO9O406 
OO5216 


027170 
O10484 


QOO0O70 
(WSS 


(20299 OO3089 


analyses of variance, one for the reliability 
measures and a second on the bias coeffi 
cients, were computed following the procedure 
described by Edwards (6, pp. 288-296) for 
repeated measurements 
groups. Summaries of these analyses of 
variance can be found in Table 1. The F 
ratios for the two main effects of the total 
number of stimuli rated by an S and the 
ordinal position of the foods in the list were 
not significant for either the reliability or 
bias measures and the 


from independent 


interactions between 


their two sources of variance were similarly 
insignificant. 
Mean untransformed reliability 


and bias 
averaging the 


converting 


measures were 
transformed 


mean 2 


obtained by 


measures and these 


values into correlation coefficients 


Table 2 


Average Rater Reliability and Bias Coeflicients a 
Zz 


Function of the Total Number of Stimuli 
Rated and the Ordinal Position 


of Stimuli in the List 


Rater Reliability Rater Bia 
Number 
ot 
Stimuli 1-10 21-30 


Position of Stimuli sition of Stimuli 


Mean 10 21-30 Mean 


0 ' 24 11 10 10 


45 Z 2 21 q 19 
OO 2x 12 12 


24 14 





68 A. W. Bendig 


The average reliability and bias coefficients 
for each of the number of stimuli and posi- 
tion of stimuli combinations can be found in 
Table 2. No consistent relationship between 
the total number of stimuli rated and reli- 
ability is evident. However, Ss rating a to- 
tal of 45 foods tended to show slightly more 
rater bias than those rating 30 or 60 foods, 
but this slight rise in rater bias with 45 
stimuli was not statistically significant. 
Raters tended to be slightly more reliable on 
the second sublist of foods rated, but again 
this relation is neither consistent nor signifi- 
cant. Mean rater bias was equal for both 
the first and second sublists of foods. The 
reliability and bias measures in Table 2 are 
of about the same magnitude as those in a 
previous study that utilized similar proce- 
dures (3). 


Discussion 


The analyses reported in Table 1 provide 
tests of the effect of judgmental fatigue and 
judgmental demoralization upon the: reliabil- 
ity of stimuli presented to the Ss (30, 45, or 
60) should be significantly large if demorali- 
zation induced by the size of the total rating 
task required of a judge affects his reliability, 
while the mean square for the ordinal posi- 
tion of the stimuli in the series should be 
sizable if fatigue is operative in this rating 
situation. As noted above, neither of these 
sources of variance was significant in the 
analysis of the rater reliability or rater bias 
measures, nor was the interaction of these 
two experimental variables statistically sig- 
nificant. Inspection of the average coeffi- 
cients in Table 2 confirms these analyses. 
We must conclude that within the parameters 
of this study fatigue and demoralization phe- 
nomena had no significant effects upon rater 
reliability and bias measures. 

The negative results in regard to “fatigue” 
confirm those in a previous study (3). Ap- 
parently the motivational level of the raters 
is sufficiently high so that a loss in motiva- 
tion that might result in a loss in reliability 


for later ratings does not occur. However, 
the number of stimuli presented to the rater 
in this and the previous study is not excessive 
(maximum of 60) and fatigue or demoraliza- 
tion effects may only be operative when 
greater numbers of judgments are required 
from the Ss. 


Summary 


Undergraduate students Ss (VN = 120) were 
divided into six groups and asked to rate lists 
of food names for preference value. Two of 
the groups rated a total of 30 foods, two 
groups rated 45 foods, and the remaining two 
rated 60 foods with different 
used with each group. Measures of rater re- 
liability and rater bias were computed for 
foods in ordinal positions i to 10 and 21 to 
30 and the transformed measures analyzed 
by analysis of variance. Results indicated 
that the main variables of the total number 
of stimuli rated by the Ss and the ordinal po- 
sition of the foods in the list had no signifi 
cant effect on either reliability or bias. It 
was concluded that judgmental fatigue and 
judgmental demoralization had no effect upon 
the reliability of food-preference ratings. 


lists of foods 


Received October 14, 1955 


References 


Bendig, A. W. Reliability of short rating scales 
and the heterogeneity of the rated 
J. appl. Psychol,, 1954, 38, 167-170 
. Bendig, A. W. Rater reliability and the hetero 
geneity of the scale anchors. J. appl. Psy- 
chol., 1955, 39, 37-39. 
Bendig, A. W. Rater reliability and “judgmental 
fatigue.” J. appl. Psychol., 1955, 39, 451-454 
Cummings, S. T. The clinician as judge; judg 
ments of adjustments from Rorschach single 
card performance. J. consult. Psychol., 1954, 
18, 243-247. 

Ebel, R. L. Estimation of the reliability of rat 
ings. Psychometrika, 1951, 16, 407-424. 
Edwards, A. L. Experimental design in psycho 
logical research. New York: Rinehart, 1950 
Fisher, R. A. Statistical methods for research 

workers. (10thed.) New York: Hafner, 1948 


stimuli 





Journal of Applied Psychology 
Vol. 41, No. 1, 1957 


A Note on a Punched-Card Method for the Solution of the Chi- 
Square Contingency Table 


Edward P. Buckley 


Naval Research Laboratory, Washington 


and George C. Widding ' 


Automatic Data Processing Branch, HDQ, U. S. Air Force, Washington 


The computational steps required for the 
solution of the chi-square contingency table 
are not extremely complex but can be tedious 
in cases where the number of cells is large 
and the cell entries are three digit numbers 
or larger. The present article describes a 
punched-card routine which enables rapid 
punched-card solution of individual contin- 
gency tables. 

Other methods are available (1) for solu- 
tion of such tables when they occur in large 
numbers at the same time. However, the 
method presented here is suitable for labora- 
where individual contingency tables 
are frequent and require rapid solution. 

Using the method suggested here, it is only 
necessary for the research rear- 
range his contingency table data in list form. 
Only a few minutes is required for the ma- 
chine installation to work from the cell en- 
tries to the chi-square value, if a board is 
permanently wired for this purpose 

Consider the example of a contingency 
table as given in Hoel’s Table 1 (2). 

Using the punched-card method suggested 
here, the experimenter requiring the chi 
square value for this table need merely sub- 
mit the data as arranged in Table 2 to his 
machine room. 


tories 


person to 


Table 1 
Marriage Adjustment Score 
Very 
High High 


College 18 29 70 115 232 
High school 17 28 x0) 41 116 
Grades only 11 10 11 20 52 


Very 


Education Low Low lotal 


Total 46 67 111 176 400 


! Formerly of Naval Research Laboratory 


Table 2 
\rrangement of Chi-Square Data for Machine Solution 
Cell Row Column 


Entries Sums Sums 
(A) (B) Cc) 


Grand 
Sum 
(D)* 
18 232 46 400 
17 116 46 400 
11 52 46 400 
29 232 67 400 
28 116 67 400 
10 52 67 100 
70 232 111 100) 
x0 116 111 400 
11 52 11 400) 

115 232 176 400 
41 116 176 10) 
20 52 176 100 


* The symbols A, B, ¢ 
ee planning chart, Fig. 1) 


D are for the machin 


The machine room will then compute the 
expected values, compute the chi-square value 
for each line, and sum these to get the chi 
square value for the contingency table 

The machine room's procedure is as fol- 
lows: 

A card would be punched for each line. It 
will be seen that the card contains the cell 
entry and appropriate row, column, and grand 
sums. A card containing an “x” in column 
80 is placed at the rear of the cell-entry 
cards. This trailer card will receive the 3R 
(final chi-square) value. If there are several 
tables to be calculated: together, it may be 
desirable to punch some identification in the 
trailer card. 

The 602A steps are simple. The panel 
maintained at the machine room is inserted 
into the machine. An “x” 80 card is passed 
through the machine to clear it and then the 
deck prepared above is inserted. The R 


69 





W idding 


~ 
L 
oe 
7 
S 
LS 
re’ 
~ 
2) 
= 
i] 
A 
L 
= 
~ 
= 
~~ 
~ 
aS 
~ 
a] 
~~ 
— 
—~) 
= 
~ 





uoNNos azenbs-tyd 10} yzeyd Butuurjd yound sunendeD 


— 


—— —— 


oy 
Wy Oly. vig 
sono | xr 


d ay Fy 


Oe | | 

3 oy! +9/Y 

,__}o¢e% sone 
voy 


OY 4314 | Po] 


Jt } -C/y 


oo uot 
Ane Le) 42414 


oy 00x xr | cpey Mrxx 


401 | 























ON GL Cele FS LS DLL Oe 








Sit HONS 





oo. 98 Wen Deee 
LD ON 
V209 3dAi HIWAd OHI AVIND IVD 





. . . 





'. 
.* 
: 
36 


— 


at ft 











4-7 rr ony Z1WIRI WI 


SISATUNY FYUNOS THD 





Note on Punched-Card Method 71 


(chi-square) values for the individual cell en- 
tries will be punched in the respective cards 
and the sum of R (% of chi square) into the 
trailer card. Thus, the trailer card contains 
the chi-square value for the table. 

The 602A panel is wired according to the 
planning chart (Fig. 1) to perform the fol- 
lowing computation: 


( B¢ ) 
a D 


R 


D 


Summary 


The advantage of this procedure over other 
punched-card methods for chi square is that 
it makes practical the machine solution of 


single contingency tables. Some other meth 
ods would be suitable only when a rather 
large number of contingency tables simul 
taneously become available for processing 
With this system the punched-card installa 
tion can merely store a board wired for this 
situation which will be 
occasion arises. The machine-room time re 


used whenever the 
quired once the board has been wired is 
minimal 


Received April 5, 1956 


References 


1. Gruenberger, F. Diagrams in punched card com 
puting. Madison of Wisconsin Press 
1954 
Hoel, P. G 
tistics 


Univer 


mathematical sta 
1946 


Introduction to 


New York: Wiley 








ERRATUM 


In the volume table of contents for 1956 (this Journal, 1956, 40, vii), an article 
by W. L. Taylor, entitled “ ‘Cloze’ Readability Scores as Indices of Individual Dif 


ferences in Comprehension and Aptitude,” is incorrectly listed as appearing on 


page 378 of the 1956 volume. ‘The article is instead published in this issue (Feb 


ruary, 1957) beginning at page 19 











Applied 


Psychologists 


The Heavy Military Electronie Equip- 
ment Department of General Elec- 
trie has a number of unusual op- 
portunities for psychologists with 
training and experience in human en- 
gineering, industrial psychology and 
training techniques. 


The psychologists will participate in 
the development and field evaluation 
of large missile guidance systems 
with responsibility for: 
Equipment Design 
Personnel Requirements 
Operations 


Training 
Knowledge of electronic equipment 
and its operation highly desirable. 
A Master’s and three years of experi- 
ence or a Ph.D. essential. 
Write in confidence to: 
Mr. John 8. Brady 
Heavy Military Electronic 
Equipment Dept. 


GENERAL @® ELECTRIC 


Court Street, Syracuse, N. Y. 














